[GRASS-dev] large vector and indices, speed up db

In response to:

Hello,
reading this thread, and being sometimes concerned with large vector
files (associated with big related tables), I wonder if it's worth
manually creating indexes (on cat field) : can it be a effective way to
speed up queries, or is the kink elsewhere, at the geometric level (data
handled by grass, not the linked DBMS) ?
Thank you,
Vincent

The problems with querying and topological operations must be at the
geometric level (e.g. click an area to see the attributes using v.what, or
cleaning operations). When working with PostgreSQL you don't have to care
about the speed of simple select operations on a single column (like the
'cat' column), it will spit out results within milliseconds, even from
tables with tens of millions of records. So, when an operation which
requires a simple select on 'cat' takes a lot of time, it must be the part
'determining the cat value'.

However, in the current implementation of some Grass tools it seems to me
(correct me if I'm wrong) that there are a few operations where the
database is certainly the limiting factor. The most notable example is
renaming a large vector table, which is extremely slow. I noticed that
PostgreSQL is working very hard. Simply renaming the vector dir and perform
a rename on the table in the database should be sufficient, and adjust the
contents of vector/<dataset>/dbln file. This should be done in a split
second, regardless of the size of the table. Doing this manually is much
much faster than renaming it using g.rename...

I'm not sure on v.extract, which is also amazingly slow. Maybe it
regenerates cat id's and must somehow keep the original information linked?

Kind regards,
Wouter

Thank you Moritz and Wouter,

personnaly used to operating directly on tables through a psql terminal,
I am not aware of the actual performances of db.* modules.

In the present case, programming special interfaces for end-users who
massively implement vector attributes, maybe it would be a good idea to
use native dbms modules instead of db.*, e.g. if I program in perl, act
on my data through DBI DBD::pg module.

Vincent

Le jeudi 26 février 2009 à 18:37 +0100, Wouter Boasson a écrit :

In response to:

>Hello,
>reading this thread, and being sometimes concerned with large vector
>files (associated with big related tables), I wonder if it's worth
>manually creating indexes (on cat field) : can it be a effective way to
>speed up queries, or is the kink elsewhere, at the geometric level (data
>handled by grass, not the linked DBMS) ?
>Thank you,
>Vincent

The problems with querying and topological operations must be at the
geometric level (e.g. click an area to see the attributes using v.what, or
cleaning operations). When working with PostgreSQL you don't have to care
about the speed of simple select operations on a single column (like the
'cat' column), it will spit out results within milliseconds, even from
tables with tens of millions of records. So, when an operation which
requires a simple select on 'cat' takes a lot of time, it must be the part
'determining the cat value'.

However, in the current implementation of some Grass tools it seems to me
(correct me if I'm wrong) that there are a few operations where the
database is certainly the limiting factor. The most notable example is
renaming a large vector table, which is extremely slow. I noticed that
PostgreSQL is working very hard. Simply renaming the vector dir and perform
a rename on the table in the database should be sufficient, and adjust the
contents of vector/<dataset>/dbln file. This should be done in a split
second, regardless of the size of the table. Doing this manually is much
much faster than renaming it using g.rename...

I'm not sure on v.extract, which is also amazingly slow. Maybe it
regenerates cat id's and must somehow keep the original information linked?

Kind regards,
Wouter

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On 26/02/09 18:37, Wouter Boasson wrote:

In response to:

Hello,
reading this thread, and being sometimes concerned with large vector
files (associated with big related tables), I wonder if it's worth
manually creating indexes (on cat field) : can it be a effective way to
speed up queries, or is the kink elsewhere, at the geometric level (data
handled by grass, not the linked DBMS) ?
Thank you,
Vincent

The problems with querying and topological operations must be at the
geometric level (e.g. click an area to see the attributes using v.what, or
cleaning operations).

Yes, querying the geometries is very slow currently because of the fact that the spatial and topological index is kept in memory and not on file, so every time you launch an interactive query, it has to rebuild it which is very slow on large vectors. See

http://josef.fsv.cvut.cz/WIKI/grass-osgeo/index.php/GRASS_7_ideas_collection#Radim.27s_TODO_list

Moritz

Moritz Lennert wrote:

Yes, querying the geometries is very slow currently because of the fact that the spatial and topological index is kept in memory and not on file, so every time you launch an interactive query, it has to rebuild it which is very slow on large vectors. See

http://josef.fsv.cvut.cz/WIKI/grass-osgeo/index.php/GRASS_7_ideas_collection#Radim.27s_TODO_list

Does someone remember the reason why the spatial index is not (no longer?) written to a file? Once the spatial index is (again?) written to a file, all that rebuilding could be avoided by reading from the file, that alone could give a speed increase.

Markus M

On 27/02/09 14:19, Markus Metz wrote:

Moritz Lennert wrote:

Yes, querying the geometries is very slow currently because of the fact that the spatial and topological index is kept in memory and not on file, so every time you launch an interactive query, it has to rebuild it which is very slow on large vectors. See

http://josef.fsv.cvut.cz/WIKI/grass-osgeo/index.php/GRASS_7_ideas_collection#Radim.27s_TODO_list

Does someone remember the reason why the spatial index is not (no longer?) written to a file? Once the spatial index is (again?) written to a file, all that rebuilding could be avoided by reading from the file, that alone could give a speed increase.

http://www.intevation.de/rt/webrt?serial_num=3193

See also hints from Radim on how to implement file-based index:
http://lists.osgeo.org/pipermail/grass-dev/2006-September/025780.html

Moritz

Wouter wrote:

However, in the current implementation of some Grass tools it seems to me
(correct me if I'm wrong) that there are a few operations where the
database is certainly the limiting factor. The most notable example is
renaming a large vector table, which is extremely slow. I noticed that
PostgreSQL is working very hard. Simply renaming the vector dir and
perform a rename on the table in the database should be sufficient,
and adjust the contents of vector/<dataset>/dbln file. This should
be done in a split second, regardless of the size of the table. Doing
this manually is much much faster than renaming it using g.rename...

but it is possible for example to copy from DBF into PostreSQL if you
changed the DB backend.

maybe we should add a "fast" rename for 'g.copy vect=a,b' which just
renames the files.

search for old posts by Radim for better reasoning.

Hamish