[GRASS-dev] Separate sqlite database files for vector maps

Dear devs,
i would like to implement sqlite database support as separated file for each vector map. Similar to the dbf approach but with sqlite. Other databases are no option in my case.
The reasons for this approach are the following:

  • The temporal vector algebra would hugely benefit from separated sqlite vector map files, since it can compute in parallel, no race condition or serial processing of database requests is required
  • Vector maps can easily be moved between locations with full database content, no data loss. Simply the vector map directory must be archived
  • Merging of mapsets is much easier, since the vector map sqlite databases don’t need to be merged, only copied

This approach would be implemented in addition to all other existing approaches.

However, i don’t know were to start with this, any hints or suggestions?

Best regards
Soeren

Hei Sören,

I ran into this recently too, with lots of “Busy SQLite” warnings, when I tried to modify two or more vector maps in parallel.

So the work you are planning to do would be very much appreciated.

However, in order to not break existing modules like v.db.join, queries across SQLite DBs would have to be supported.

Here the “attach” keyword might be a starting point for required additions to e.g. v.db.join: http://sqlite.org/lang_attach.html

It is probably also necessary to think about how (or more precise where) tables are supposed to be handled which are not linked to a vector map… Will all non-map tables be saved in their own SQLite DB file?

Do you plan to store tables linked to additional layers in the same SQLite file as the vector map?

How would modules like db.tables or db.copy behave? What happens if you connect the attribute table of map a (or just table a) to map b?

Just some thoughts …

Cheers

Stefan

···

Dear devs,

i would like to implement sqlite database support as separated file for each vector map. Similar to the dbf approach but with sqlite. Other databases are no option in my case.

The reasons for this approach are the following:

  • The temporal vector algebra would hugely benefit from separated sqlite vector map files, since it can compute in parallel, no race condition or serial processing of database requests is required

  • Vector maps can easily be moved between locations with full database content, no data loss. Simply the vector map directory must be archived

  • Merging of mapsets is much easier, since the vector map sqlite databases don’t need to be merged, only copied

This approach would be implemented in addition to all other existing approaches.

However, i don’t know were to start with this, any hints or suggestions?

Best regards

Soeren

On Fri, Sep 9, 2016 at 11:06 PM, Sören Gebbert
<soerengebbert@googlemail.com> wrote:

Dear devs,
i would like to implement sqlite database support as separated file for each
vector map. Similar to the dbf approach but with sqlite. Other databases are
no option in my case.
The reasons for this approach are the following:

* The temporal vector algebra would hugely benefit from separated sqlite
vector map files, since it can compute in parallel, no race condition or
serial processing of database requests is required
* Vector maps can easily be moved between locations with full database
content, no data loss. Simply the vector map directory must be archived
* Merging of mapsets is much easier, since the vector map sqlite databases
don't need to be merged, only copied

This approach would be implemented in addition to all other existing
approaches.

However, i don't know were to start with this, any hints or suggestions?

This is implemented in G6.3, maybe also early versions of G6.4, but
has been removed at some stage, no idea why.
Have a look at lib/vector/Vlib/field.c, there at the fns
Vect_subst_var()
and
Vect_get_dblink()

Essentially, all that is needed is a corresponding entry in $LOCATION_NAME/VAR:
instead of
DB_DRIVER: sqlite
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db

try
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite/sqlite.db

HTH,

Markus M

On Sat, Sep 10, 2016 at 5:33 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Fri, Sep 9, 2016 at 11:06 PM, Sören Gebbert
<soerengebbert@googlemail.com> wrote:

Dear devs,
i would like to implement sqlite database support as separated file for each
vector map. Similar to the dbf approach but with sqlite. Other databases are
no option in my case.
The reasons for this approach are the following:

* The temporal vector algebra would hugely benefit from separated sqlite
vector map files, since it can compute in parallel, no race condition or
serial processing of database requests is required
* Vector maps can easily be moved between locations with full database
content, no data loss. Simply the vector map directory must be archived
* Merging of mapsets is much easier, since the vector map sqlite databases
don't need to be merged, only copied

This approach would be implemented in addition to all other existing
approaches.

However, i don't know were to start with this, any hints or suggestions?

This is implemented in G6.3, maybe also early versions of G6.4, but
has been removed at some stage, no idea why.
Have a look at lib/vector/Vlib/field.c, there at the fns
Vect_subst_var()
and
Vect_get_dblink()

Essentially, all that is needed is a corresponding entry in $LOCATION_NAME/VAR:

in $LOCATION_NAME/$MAPSET/VAR:

instead of
DB_DRIVER: sqlite
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db

try
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite/sqlite.db

HTH,

Markus M

Mi Markus,

···

[snip]

Essentially, all that is needed is a corresponding entry in $LOCATION_NAME/VAR:
instead of
DB_DRIVER: sqlite
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db

try
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite/sqlite.db

HTH,

This helped a lot, thank you. Simply running:
db.connect driver=sqlite database=‘$GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite.db’

will do the job. This can be run right after mapset creation.

And it works nicely. It seems that most of the v.db.* commands work with vector map specific database files. The db.* commands seemed work as well, if the database path is provided at the command line.

I skipped the sqlite directory in the vector map path ($MAP/sqlite/sqlite.db → $MAP/sqlite.db), since this directory must be created when a new vector is opened, hence the vector-open-new functions must be modified to support a separate sqlite directory. With the simple solution the directory fsync() behavior of sqlite will force now all vector map specific files to be flushed to disc. This may be a performance problem.

This solution may be sufficient for my needs, no modification needed in the code … except, v.info should show the database path of a vector map if extended metadata is required.

Many thanks for your help
Best regards
Soeren

Markus M

Hi,

···

i found a simple solution with the help of Markus Metz:

db.connect driver=sqlite database=‘$GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite.db’

This will set the database connection for all vector maps in a mapset, hence it should be run directly after mapset creation.

2016-09-09 23:40 GMT+02:00 Blumentrath, Stefan <Stefan.Blumentrath@nina.no>:

Hei Sören,

I ran into this recently too, with lots of “Busy SQLite” warnings, when I tried to modify two or more vector maps in parallel.

So the work you are planning to do would be very much appreciated.

However, in order to not break existing modules like v.db.join, queries across SQLite DBs would have to be supported.

Here the “attach” keyword might be a starting point for required additions to e.g. v.db.join: http://sqlite.org/lang_attach.html

The simple solution that works for me will not work with modules that require a single database to store all vector tables. Hence, all the modules that do not work with dbf databases may also not work with vector map specific sqlite databases.

It is probably also necessary to think about how (or more precise where) tables are supposed to be handled which are not linked to a vector map… Will all non-map tables be saved in their own SQLite DB file?

I have no clue since i don’t have these requirements.

Do you plan to store tables linked to additional layers in the same SQLite file as the vector map?

Nope.

How would modules like db.tables or db.copy behave? What happens if you connect the attribute table of map a (or just table a) to map b?

I see no problem with db.tables and db.copy since all db.* commands have database options to specify the path to the required sqlite database. Connecting a table from map a to map b will not work with the simple solution.

IMHO the simple db.connect solution should provide all the functionality that i need at the moment, which are parallel vector map processing and mapset merging.

Best regards
Soeren

Just some thoughts …

Cheers

Stefan

From: grass-dev [mailto:grass-dev-bounces@lists.osgeo.org] On Behalf Of Sören Gebbert
Sent: 9. september 2016 23:07
To: GRASS developers list <grass-dev@lists.osgeo.org>
Subject: [GRASS-dev] Separate sqlite database files for vector maps

Dear devs,

i would like to implement sqlite database support as separated file for each vector map. Similar to the dbf approach but with sqlite. Other databases are no option in my case.

The reasons for this approach are the following:

  • The temporal vector algebra would hugely benefit from separated sqlite vector map files, since it can compute in parallel, no race condition or serial processing of database requests is required

  • Vector maps can easily be moved between locations with full database content, no data loss. Simply the vector map directory must be archived

  • Merging of mapsets is much easier, since the vector map sqlite databases don’t need to be merged, only copied

This approach would be implemented in addition to all other existing approaches.

However, i don’t know were to start with this, any hints or suggestions?

Best regards

Soeren

On Sat, Sep 10, 2016 at 11:24 PM, Sören Gebbert
<soerengebbert@googlemail.com> wrote:

Mi Markus,

[snip]

Essentially, all that is needed is a corresponding entry in
$LOCATION_NAME/VAR:
instead of
DB_DRIVER: sqlite
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db

try
DB_DATABASE: $GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite/sqlite.db

HTH,

This helped a lot, thank you. Simply running:
db.connect driver=sqlite
database='$GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/sqlite.db'

will do the job. This can be run right after mapset creation.

And it works nicely. It seems that most of the v.db.* commands work with
vector map specific database files. The db.* commands seemed work as well,
if the database path is provided at the command line.

I skipped the sqlite directory in the vector map path ($MAP/sqlite/sqlite.db
-> $MAP/sqlite.db), since this directory must be created when a new vector
is opened, hence the vector-open-new functions must be modified to support a
separate sqlite directory. With the simple solution the directory fsync()
behavior of sqlite will force now all vector map specific files to be
flushed to disc. This may be a performance problem.

sqlite might not open the (new) database at all if it can not fsync()
the directory. That will not result in performance penalty but in a
fatal error, therefore it is a good idea to keep the sqlite db in a
separate folder.

The GRASS sqlite driver attempts to create the sqlite directory
(db__driver_open_database() in trunk/db/driver/sqlite/db.c) if the
directory does not exist, therefore Vect_open_new*() should not need
any modification.

This solution may be sufficient for my needs, no modification needed in the
code ... except, v.info should show the database path of a vector map if
extended metadata is required.

Many thanks for your help
Best regards
Soeren

Markus M