[GRASS-dev] Some v.out.ogr shortcomings

Dear all,

After my recent experiments with v.out.ogr, I noticed a number
of weaknesses that show particularly when working with data
stored in a DBMS.

I would like to put some work into improving it.
So could I have your comments on the following thoughts,
please?

1. The GRASS primary key "cat" column always gets exported
(and upon re-import and export, you may get "_cat", ...).
This does not always seem to be useful and can even be
significant bloat for very large datasets. How about adding
a flag to drop (or preserve?) it in the output?

2. There is currently no implemented action for exporting
"Kernel" type geometries. Should they be exported simply
as 3D points?

3. Export in general is _very_ slow (I think this has been
discussed here before). On my machine v.out.ogr takes almost
8 mins to export 32,000 points to an SQLite database file.
Compare that to ogr2ogr which does the same job (plus importing
the input data) in 4 seconds! All that extra time seems to
be spent in mk_at() which generates the attribute table.
The DBMS management code look like quite some overhead.
How about loading all the attribute tables into a simple
in-mem C structure in one go and dumping them straight into
each OGR feature definition on export? This would mean that
the whole attribute table needs to be in memory (but it
should also be possible to do this in segments) and could
be controlled via a module flag. Or is that unlikely to
gain much performance?

4. SQLite happens to be an OGR driver that does not supports
overwriting existing datasources. Thus,

  Ogr_ds = OGR_Dr_CreateDataSource(Ogr_driver, dsn_opt->answer,
           papszDSCO);

as used by v.in.ogr will always fail. This means that users
are confined to exporting data to a new table in a new SQLite
database only. In fact, SQLite is capable of updating an existing
database, by adding a new table to it. How about a new flag "-u"
to open an OGR datasource in "update" mode, using OGR_Dr_Open(?)
This could also benefit other OGR drivers with similar design
issues.

Thanks,

Ben

------
Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.

On Mon, Aug 31, 2009 at 2:55 PM, Benjamin
Ducke<benjamin.ducke@oxfordarch.co.uk> wrote:

Dear all,

After my recent experiments with v.out.ogr, I noticed a number
of weaknesses that show particularly when working with data
stored in a DBMS.

I would like to put some work into improving it.
So could I have your comments on the following thoughts,
please?

1. The GRASS primary key "cat" column always gets exported
(and upon re-import and export, you may get "_cat", ...).
This does not always seem to be useful and can even be
significant bloat for very large datasets. How about adding
a flag to drop (or preserve?) it in the output?

A flag to deactivate it would be nice. Like this the module
keeps common behavior and the flag is an addon.

2. There is currently no implemented action for exporting
"Kernel" type geometries. Should they be exported simply
as 3D points?

For the time being perhaps yes (no idea if there are any OGC
specs for this or plans in OGR)

3. Export in general is _very_ slow (I think this has been
discussed here before). On my machine v.out.ogr takes almost
8 mins to export 32,000 points to an SQLite database file.

I was hit by the same problem today: Placenames of Italy

# in LatLong location:
wget -chttp://download.geonames.org/export/dump/IT.zip
unzip IT.zip IT.txt
v.in.geonames in=IT.txt out=italy_geonames
v.out.ogr italy_geonames dsn=italy_geonames_LL.shp type=point

... this takes forever. But:

Compare that to ogr2ogr which does the same job (plus importing
the input data) in 4 seconds!

Right:
ogr2ogr italy_geonames_LL.shp
grassdata/latlong_wgs84/neteler/vector/italy_geonames/head

... perhaps 10 seconds for 43000 points.
Since ogr2ogr uses the GRASS plugin to read the data, the problem
must be in the GRASS part of v.out.ogr.

All that extra time seems to
be spent in mk_at() which generates the attribute table.
The DBMS management code look like quite some overhead.
How about loading all the attribute tables into a simple
in-mem C structure in one go and dumping them straight into
each OGR feature definition on export? This would mean that
the whole attribute table needs to be in memory (but it
should also be possible to do this in segments) and could
be controlled via a module flag. Or is that unlikely to
gain much performance?

No idea but certainly worth trying. How does ogr2ogr deal with
it?

4. SQLite happens to be an OGR driver that does not supports
overwriting existing datasources. Thus,

Ogr_ds = OGR_Dr_CreateDataSource(Ogr_driver, dsn_opt->answer,
papszDSCO);

as used by v.in.ogr will always fail.

Worth an OGR bug ticket?

This means that users
are confined to exporting data to a new table in a new SQLite
database only. In fact, SQLite is capable of updating an existing
database, by adding a new table to it. How about a new flag "-u"
to open an OGR datasource in "update" mode, using OGR_Dr_Open(?)
This could also benefit other OGR drivers with similar design
issues.

Sounds good!

Markus