In reply to Vincent's remarks on database interfaces:
I don't know how the db.* modules are implemented, but I can't imagine that
they process the data themselves. That would mean downloading all the data
from PostgreSQL and building your own database engine...
When reading the grass manual: it states that all SQL of the DBMS is
available through the db.* modules, which means to me as much as: the SQL
Query/DDL string is passed to the DBMS engine, and the DBMS engine does the
job. I think they're correctly implemented and as fast as any other
interface to a RDBMS. Or maybe not when it comes to downloading data, but
the limiting factor is certainly to decide what data to download or update,
which is a geometry issue.
So, when operations where database access is involved are slow, it is very
likely that it is the code on top of the database access and management
(db.*) modules, which either doesn't make smart use of the database
capabilities, or geometry operations are involved.
With kind regards,
Wouter
personnaly used to operating directly on tables through a psql terminal,
I am not aware of the actual performances of db.* modules.
In the present case, programming special interfaces for end-users who
massively implement vector attributes, maybe it would be a good idea to
use native dbms modules instead of db.*, e.g. if I program in perl, act
on my data through DBI DBD::pg module.
Vincent
On 03/03/09 22:17, Wouter Boasson wrote:
In reply to Vincent's remarks on database interfaces:
I don't know how the db.* modules are implemented, but I can't imagine that
they process the data themselves.
No, they use the underlying engine of the backend. But connection is costly. This is why scripts which had a loop calling [v.]db.* statements in each run of the loop should be rewritten to collect SQL statements in a temp file and then call db.execute once at the end. See [1] for the example of v.rast.stats.
See also some comments of Radim in [2] under "Attributes".
Moritz
[1]http://trac.osgeo.org/grass/changeset/20358
[2]http://freegis.org/cgi-bin/viewcvs.cgi/grass6/doc/vector/TODO?rev=HEAD&content-type=text/vnd.viewcvs-markup
Thank you for the links Moritz,
VB
Le mercredi 04 mars 2009 à 14:32 +0100, Moritz Lennert a écrit :
On 03/03/09 22:17, Wouter Boasson wrote:
> In reply to Vincent's remarks on database interfaces:
> I don't know how the db.* modules are implemented, but I can't imagine that
> they process the data themselves.
No, they use the underlying engine of the backend. But connection is
costly. This is why scripts which had a loop calling [v.]db.* statements
in each run of the loop should be rewritten to collect SQL statements in
a temp file and then call db.execute once at the end. See [1] for the
example of v.rast.stats.
See also some comments of Radim in [2] under "Attributes".
Moritz
[1]http://trac.osgeo.org/grass/changeset/20358
[2]http://freegis.org/cgi-bin/viewcvs.cgi/grass6/doc/vector/TODO?rev=HEAD&content-type=text/vnd.viewcvs-markup
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev
On 06/03/09 10:30, Vincent Bain wrote:
See also some comments of Radim in [2] under "Attributes".
An important point in the above is:
"Another problem is random access to the data in RDBMS
from an application which is terribly slow (due to communication with
server)"
Most vector modules loop through the cats of the vector map and access the table cat by cat. This creates the random access problem that Radim speaks about, with significant overhead in terms of connection costs.
The new module d.thematic.area uses a different model, selecting an array of cats and values (db_select_CatValArray see line 243 of display/d.thematic.area/main.c in devel6 branch) and during display loops through the areas in the map, but gets the necessary values from this array, instead of connecting each time to the db (see line 139 in display/d.thematic.area/area.c). Thus the random access is handled by db_CatValArray_get_value and not by a connection to the db, contrary to what is done in d.vect.chart for example (see line 91 of display/d.vect.chart/plot.c). By the way, quite a while I proposed a modified version of d.vect.chart (d.vect.chart2) which implements the dbCatValArray model [3] and shows significant speed gains[4], and an even more general approach where you can launch any arbitrary SQL query as long as it returns a list of category values and display the features which correspond to these values. This might allow a more attribute data-centric management and display of maps (see discussion in [5]).
So, it might be interesting to do an audit of database access in different vector modules to see where a different approach could allow speed gains.
Moritz
[2]http://freegis.org/cgi-bin/viewcvs.cgi/grass6/doc/vector/TODO?rev=HEAD&content-type=text/vnd.viewcvs-markup
[3]http://geog-pc40.ulb.ac.be/grass/chart/
[4]http://lists.osgeo.org/pipermail/grass-dev/2006-October/026624.html
[5]http://lists.osgeo.org/pipermail/grass-dev/2006-October/026625.html
Moritz Lennert wrote:
On 06/03/09 10:30, Vincent Bain wrote:
See also some comments of Radim in [2] under "Attributes".
An important point in the above is:
"Another problem is random access to the data in RDBMS
from an application which is terribly slow (due to communication with
server)"
Most vector modules loop through the cats of the vector map and access the table cat by cat. This creates the random access problem that Radim speaks about, with significant overhead in terms of connection costs.
The new module d.thematic.area uses a different model, selecting an array of cats and values (db_select_CatValArray see line 243 of display/d.thematic.area/main.c in devel6 branch) and during display loops through the areas in the map, but gets the necessary values from this array, instead of connecting each time to the db (see line 139 in display/d.thematic.area/area.c). Thus the random access is handled by db_CatValArray_get_value and not by a connection to the db, contrary to what is done in d.vect.chart for example (see line 91 of display/d.vect.chart/plot.c). By the way, quite a while I proposed a modified version of d.vect.chart (d.vect.chart2) which implements the dbCatValArray model [3] and shows significant speed gains[4], and an even more general approach where you can launch any arbitrary SQL query as long as it returns a list of category values and display the features which correspond to these values. This might allow a more attribute data-centric management and display of maps (see discussion in [5]).
So, it might be interesting to do an audit of database access in different vector modules to see where a different approach could allow speed gains.
Moritz
[2]http://freegis.org/cgi-bin/viewcvs.cgi/grass6/doc/vector/TODO?rev=HEAD&content-type=text/vnd.viewcvs-markup
[3]http://geog-pc40.ulb.ac.be/grass/chart/
[4]http://lists.osgeo.org/pipermail/grass-dev/2006-October/026624.html
[5]http://lists.osgeo.org/pipermail/grass-dev/2006-October/026625.html
Moritz, could this information be placed somewhere where it does not get lost, e.g. in the wiki here [1] or here [2], or in the programmer's manual? Using the dbCatValArray model does not break compatibility, it would simply speed up modules, right? Maybe we can also have a document somewhere describing solutions for the Vector TODO list plus other enhancements, or is it better to discuss it in the dev ML?
Markus M
[1] http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Modules_2
[2] http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Modules_4
On 06/03/09 11:45, Markus Metz wrote:
Moritz, could this information be placed somewhere where it does not get lost, e.g. in the wiki here [1] or here [2], or in the programmer's manual?
Yes, I'll try to integrate it in the wiki (or if someone else wants to start, go ahead). I think that it's more an issue for the vector (+ possibly display) modules than the db.* modules.
Using the dbCatValArray model does not break compatibility, it would simply speed up modules, right?
Well, that's something we would have to check, I don't have an exhaustive enough vision to judge that.
Maybe we can also have a document somewhere describing solutions for the Vector TODO list plus other enhancements, or is it better to discuss it in the dev ML?
No, a wiki page is probably a good place, which doesn't exclude discussion on the ML, but leaves a more permanent trace.
Moritz