[GRASS-dev] single vs. multiple sqlite db files [was v.db.calc]

On Monday 11 May 2009, C Michael Barton wrote:

Hi Dylan,

Thanks for your input on the types of joins you do. This is very useful.

On 5/11/09 11:03 AM, "Dylan Beaudette" <debeaudette@ucdavis.edu> wrote:
> Note that one database per vector file, as we now have with the more
> limited dbf format, does not preclude joins, views, schemas, etc.

That may be true, but what mechanism is 'doing' the joining? I realize that
with the DBF interface all 'SQL-like' operations are somewhat of an
abomination, but with something like SQLite at least we know that a robust
system can deal with most of the SQL thrown at it. This is especially
important when trying to work with subqueries or LEFT joins. I am not sure
how these operations would be possible between different (SQLite) database
files. This could be due to my limited experience with SQLite...

You can have multiple tables in an SQLite database. These are simply
imported or created. So joins would be possible between any table in the
database. The question is whether to have one SQLite database (with one or
more tables, depending on one's need) for each vector file OR to have a
single SQLite for ALL vector files in a mapset.

The latter has the greatest possibility of making any kind of join between
any table used with any vector file in a mapset. However, it breaks the
portability we now have with respect to vectors in which you can grab the
vector folder from the mapset and copy/move/send it to any other equivalent
location with all parts intact (vector GIS files, indexes, and attribute
tables).

So the issue is whether the ability to make any kind of join between any
table within a mapset (does not apply to between mapset joins as proposed
currently) is more important than maintaining easy vector portability OR is
portability more important than making joins between the tables of
different vectors within a mapset? I think that all feel that the ability
to switch back and forth between these organizational principles is
probably needed. So it comes down to which is the default. I'd argue that
the 2nd is probably more useful for more people, but this is purely
anecdotal. Perhaps the way I work with GIS is actually atypical. So it is
good to hear from people like you who work with attribute tables a lot.

Michael

Sure. As long as there is a documented approach to adjusting this behavior. I
think that apart from joins, the table / file locking issue may be more
important... If a long-running process is causing the mapset-wide SQLite db
to be locked, then vector-specific database files would make more sense.

Interesting stuff!

Cheers,
Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On 11/05/09 22:59, Dylan Beaudette wrote:

On Monday 11 May 2009, C Michael Barton wrote:

You can have multiple tables in an SQLite database. These are simply
imported or created. So joins would be possible between any table in the
database. The question is whether to have one SQLite database (with one or
more tables, depending on one's need) for each vector file OR to have a
single SQLite for ALL vector files in a mapset.

The latter has the greatest possibility of making any kind of join between
any table used with any vector file in a mapset. However, it breaks the
portability we now have with respect to vectors in which you can grab the
vector folder from the mapset and copy/move/send it to any other equivalent
location with all parts intact (vector GIS files, indexes, and attribute
tables).

So the issue is whether the ability to make any kind of join between any
table within a mapset (does not apply to between mapset joins as proposed
currently) is more important than maintaining easy vector portability OR is
portability more important than making joins between the tables of
different vectors within a mapset?

I would plead for solid modules allowing to export and import GRASS data between GRASS installations (e.g. something like the r.pack/r.unpack modules). IMHO, moving around data should not depend on the location of the database or the choice of database backend (i.e. how do you handle above export with a PostgreSQL backend ?). We need to povide a mecanism which allows users to do that while respecting the unwritten law that they should not have to do anything at all inside their GISDBASE.

Sure. As long as there is a documented approach to adjusting this behavior. I think that apart from joins, the table / file locking issue may be more important... If a long-running process is causing the mapset-wide SQLite db to be locked, then vector-specific database files would make more sense.

I agree that locking can be a serious issue for sqlite. I didn't think of this before, but AFAIK, sqlite can only provide file-based locking as it is a single-file database [1]. With this in mind, I might actually change my mind and plead for one db per vector, although I agree with Dylan that sharing data across vector file attributs tables happens regularly for me.

I guess that sqlite is still more in the same line of solutions as dbf, only that if provides a much larger set of possibilities, but that if you want to do any serious use of db functionalities you will in any case have to turn to PostgreSQL of MySQL...

Moritz

[1] http://www.sqlite.org/lockingv3.html