[GRASS-dev] OGR write access

Hi,

I am thinking how to implement direct write access to OGR datasources
from the user point of view. One approach would be to implement global
flag '--u' for updating existing vector map (i.e. OGR datasource).
E.g.

v.out.ogr input=test dsn=. type=point -n
v.external dsn=. layer=test output=test
v.random out=test n=1000 --u

this could work also for native format

v.edit map=test tool=create
v.random out=test n=1000 --u

Or to add new parameters 'dns/format' which would be used only for OGR
format, not for the native one.

v.random out=test n=1000 format=ESRI_Shapefile dns=.

Any ideas?

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

Martin Landa wrote:

Hi,

I am thinking how to implement direct write access to OGR datasources
from the user point of view. One approach would be to implement global
flag '--u' for updating existing vector map (i.e. OGR datasource).
E.g.

v.out.ogr input=test dsn=. type=point -n
v.external dsn=. layer=test output=test
v.random out=test n=1000 --u
  

Not sure if I understand right: updating an existing vector map, be it OGR or native, works for some but not all modules. Some modules first copy all or selected primitives from input to output, then modify output, then write output support files. That could cause duplication of all primitives and a bogus result for some modules like e.g. v.generalize, v.clean, v.select.

this could work also for native format

v.edit map=test tool=create
v.random out=test n=1000 --u

Or to add new parameters 'dns/format' which would be used only for OGR
format, not for the native one.
  v.random out=test n=1000 format=ESRI_Shapefile dns=.
  

How about a slight modification of Radim's suggestion:
v.random out=./shapefiles/@OGR,layer=test,format=ESRI_Shapefile

or something similar so that the out option can easily be parsed?

Alternatively, modules that create output could be modified to have two separate output options: outmap (can be OGR dns) and outlayer, analogous to specifying input with map and layer as already present in many modules. Format of outlayer could be <number>/"name", make it optional with default 1/<mapname>. OGR format (Shapefile, POSTGIS, etc) still needs to go to somewhere. Modify the new global format option to reflect either native or which OGR format to use? OGR would be triggered by virtual output mapset OGR?

As starting points, I think we need to (1) globally implement layer names, also for native vectors, and without the need to link an attribute table to that layer, (2) support updating attribute tables of v.external vectors. After that, direct writing of primitives/features could be implemented. Makes sense?

Markus M

Hi,

2009/10/16 Markus Metz <markus.metz.giswork@googlemail.com>:

I am thinking how to implement direct write access to OGR datasources
from the user point of view. One approach would be to implement global
flag '--u' for updating existing vector map (i.e. OGR datasource).
E.g.

v.out.ogr input=test dsn=. type=point -n
v.external dsn=. layer=test output=test
v.random out=test n=1000 --u

Not sure if I understand right: updating an existing vector map, be it OGR
or native, works for some but not all modules. Some modules first copy all
or selected primitives from input to output, then modify output, then write
output support files. That could cause duplication of all primitives and a
bogus result for some modules like e.g. v.generalize, v.clean, v.select.

I was speaking about empty vector maps. Anyway '--u' could be useful
also for non-empty vector maps. Then you could use e.g. v.select
without need to patch maps. E.g.

v.select ain=imap1 bin=imap2 out=omap12 ope=within
v.select ain=imap3 bin=imap4 out=omap34 ope=contains
v.patch in=omap12,omap34 out=map

with '--u' flag.

v.select ain=imap1 bin=imap2 out=omap12 ope=within
v.select ain=imap3 bin=imap4 out=omap12 ope=contains --u

Of course you can get bogus (duplicate categories, etc.) with --u, it
depends on the data you are working with. Generally '--u' should
implement 'append' mode instead of just overwriting files. Of course
there are some modules which basically cannot support update mode.

this could work also for native format

v.edit map=test tool=create
v.random out=test n=1000 --u

Or to add new parameters 'dns/format' which would be used only for OGR
format, not for the native one.
v.random out=test n=1000 format=ESRI_Shapefile dns=.

How about a slight modification of Radim's suggestion:
v.random out=./shapefiles/@OGR,layer=test,format=ESRI_Shapefile

or something similar so that the out option can easily be parsed?

yes, anyway it requires adding 'format' modules to the all vector
modules which have 'output' parameter defined and modifying
Vect_open_new() to pass format parameter, etc.

[...]

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

Hi,

2009/10/16 Martin Landa <landa.martin@gmail.com>:

I am thinking how to implement direct write access to OGR datasources
from the user point of view. One approach would be to implement global
flag '--u' for updating existing vector map (i.e. OGR datasource).

as the '--u' is global flag we should discuss for which GIS elements
it make sense and if so how to implement the update mode for them, eg.
raster maps. In the case that we end up with the result that '--u'
doesn't make sense at all, the second approach could be implemented
for vectors - adding manually 'format' parameter to the selected
vector modules, e.g.

v.random out=test.shp@OGR n=1000 format=ESRI_Shapefile

or

v.random out=.@OGR n=1000 layer=test format=ESRI_Shapefile

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

Martin Landa wrote:

Hi,

2009/10/16 Markus Metz <markus.metz.giswork@googlemail.com>:
  

Not sure if I understand right: updating an existing vector map, be it OGR
or native, works for some but not all modules. Some modules first copy all
or selected primitives from input to output, then modify output, then write
output support files. That could cause duplication of all primitives and a
bogus result for some modules like e.g. v.generalize, v.clean, v.select.
    
I was speaking about empty vector maps. Anyway '--u' could be useful
also for non-empty vector maps. Then you could use e.g. v.select
without need to patch maps. E.g.

v.select ain=imap1 bin=imap2 out=omap12 ope=within
v.select ain=imap3 bin=imap4 out=omap34 ope=contains
v.patch in=omap12,omap34 out=map

with '--u' flag.

v.select ain=imap1 bin=imap2 out=omap12 ope=within
v.select ain=imap3 bin=imap4 out=omap12 ope=contains --u
  

Sounds good! Then the module does not need to check if output is indeed empty when already existing. Still, I'm not so sure if --u makes sense for all modules.

Of course you can get bogus (duplicate categories, etc.) with --u, it
depends on the data you are working with. Generally '--u' should
implement 'append' mode instead of just overwriting files. Of course
there are some modules which basically cannot support update mode.
  

Then rather implement --u as an option for some modules but not all and not make it global?

  

How about a slight modification of Radim's suggestion:
v.random out=./shapefiles/@OGR,layer=test,format=ESRI_Shapefile

or something similar so that the out option can easily be parsed?
    
yes, anyway it requires adding 'format' modules to the all vector
modules which have 'output' parameter defined

Hmm, for direct OGR write access, you need to specify the format anyway somewhere? There is already 'format' added to all vector modules which have input defined, can't be too difficult. And as I mentioned before, I think a new output option for output layer could make sense and AFAICT is required for direct OGR write access.

Maybe this is one step ahead, first fully implement direct OGR read access without the need for v.external?

and modifying
Vect_open_new() to pass format parameter, etc.
  

Yes, in principle similar to Vect_open_old(), needs new Vect_open_new_ogr().

Markus M

Hi,

2009/10/16 Markus Metz <markus.metz.giswork@googlemail.com>:

Not sure if I understand right: updating an existing vector map, be it
OGR
or native, works for some but not all modules. Some modules first copy
all
or selected primitives from input to output, then modify output, then
write
output support files. That could cause duplication of all primitives and
a
bogus result for some modules like e.g. v.generalize, v.clean, v.select.

I was speaking about empty vector maps. Anyway '--u' could be useful
also for non-empty vector maps. Then you could use e.g. v.select
without need to patch maps. E.g.

v.select ain=imap1 bin=imap2 out=omap12 ope=within
v.select ain=imap3 bin=imap4 out=omap34 ope=contains
v.patch in=omap12,omap34 out=map

with '--u' flag.

v.select ain=imap1 bin=imap2 out=omap12 ope=within
v.select ain=imap3 bin=imap4 out=omap12 ope=contains --u

Sounds good! Then the module does not need to check if output is indeed
empty when already existing. Still, I'm not so sure if --u makes sense for
all modules.

It's also not clear how to implement 'update' mode for other GIS
elements (raster map, ...).

Of course you can get bogus (duplicate categories, etc.) with --u, it
depends on the data you are working with. Generally '--u' should
implement 'append' mode instead of just overwriting files. Of course
there are some modules which basically cannot support update mode.

Then rather implement --u as an option for some modules but not all and not
make it global?

Then we end up with the need to update manually every module which can
support update mode. Probably we could build a list of modules where
make sense to have update mode and not (having 'output' parameter).

How about a slight modification of Radim's suggestion:
v.random out=./shapefiles/@OGR,layer=test,format=ESRI_Shapefile

or something similar so that the out option can easily be parsed?

yes, anyway it requires adding 'format' modules to the all vector
modules which have 'output' parameter defined

Hmm, for direct OGR write access, you need to specify the format anyway
somewhere? There is already 'format' added to all vector modules which have
input defined, can't be too difficult. And as I mentioned before, I think a

Probably I don't understand, I can't see 'format' parameter only in
v.out.ogr, v.in.ogr and similar modules. Currently would be possible"

v.out.ogr input=test dsn=`pwd` type=point -n
v.external dsn=. layer=test output=test
v.random out=test n=1000 --u

but probably

v.random map=`pwd`@OGR n=1000 olayer=test format=ESRI_Shapefile

is better approach.

new output option for output layer could make sense and AFAICT is required
for direct OGR write access.

yes, also olayer would be required. But it would make sense only for
OGR format, not the native format, am I right?

v.extract in=imap layer=1 where="..." out="PG:dbname=db" olayer=omap
format=PostgreSQL
v.extract in=imap layer=1 where="..." out=omap format=native

Maybe this is one step ahead, first fully implement direct OGR read access
without the need for v.external?

yes, I am working on it.

and modifying
Vect_open_new() to pass format parameter, etc.

Yes, in principle similar to Vect_open_old(), needs new Vect_open_new_ogr().

Right.

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

Hi,

Martin:

Markus:
  

Then rather implement --u as an option for some modules but not all and not
make it global?
    
Then we end up with the need to update manually every module which can
support update mode. Probably we could build a list of modules where
make sense to have update mode and not (having 'output' parameter).
  

I think so, too dangerous to make it global. In any case, I don't see a way around going through each vector module and see if it could support --u.

  

How about a slight modification of Radim's suggestion:
v.random out=./shapefiles/@OGR,layer=test,format=ESRI_Shapefile

or something similar so that the out option can easily be parsed?

yes, anyway it requires adding 'format' modules to the all vector
modules which have 'output' parameter defined
      

Hmm, for direct OGR write access, you need to specify the format anyway
somewhere? There is already 'format' added to all vector modules which have
input defined, can't be too difficult. And as I mentioned before, I think a
    
Probably I don't understand, I can't see 'format' parameter only in
v.out.ogr, v.in.ogr and similar modules. Currently would be possible"
  

I was referring to grass7 wxGUI e.g. d.vect has a new option element called Format with the choices Native and OGR and I suggested to (1) do that with e.g. ./shapefiles/@OGR and make that option element a format option in the sense of v.[in|out].ogr for output options.

v.out.ogr input=test dsn=`pwd` type=point -n
v.external dsn=. layer=test output=test
v.random out=test n=1000 --u

but probably

v.random map=`pwd`@OGR n=1000 olayer=test format=ESRI_Shapefile

is better approach.

new output option for output layer could make sense and AFAICT is required
for direct OGR write access.
    
yes, also olayer would be required. But it would make sense only for
OGR format, not the native format, am I right?
  

Thinking about it, there may be problems because some modules may produce several output layers in the same output vector map if feature cats refer to several layers. No idea how to accommodate that with direct OGR write access where each OGR layer needs its own name not number. Fetch layer name from input vector, append layer number to map name if no layer name available?

v.extract in=imap layer=1 where="..." out="PG:dbname=db" olayer=omap
format=PostgreSQL
v.extract in=imap layer=1 where="..." out=omap format=native
  

Should work, but preferably use input layer name if available?

Markus M

Hallo,

2009/10/16 Markus Metz <markus.metz.giswork@googlemail.com>:

[...]

Hmm, for direct OGR write access, you need to specify the format anyway
somewhere? There is already 'format' added to all vector modules which
have
input defined, can't be too difficult. And as I mentioned before, I think
a

Probably I don't understand, I can't see 'format' parameter only in
v.out.ogr, v.in.ogr and similar modules. Currently would be possible"

I was referring to grass7 wxGUI e.g. d.vect has a new option element called
Format with the choices Native and OGR and I suggested to (1) do that with
e.g. ./shapefiles/@OGR and make that option element a format option in the
sense of v.[in|out].ogr for output options.

Ah, right, in wxGUI you can choose between native and OGR format to be
read. For native format 'input/map' is enough, for OGR you need to
choose 'input/map' (i.e. OGR datasource) and 'layer' (i.e. OGR layer).
Anyway wxGUI dialogs need to be improved, it was just quick and dirty
solution...

yes, also olayer would be required. But it would make sense only for
OGR format, not the native format, am I right?

Thinking about it, there may be problems because some modules may produce
several output layers in the same output vector map if feature cats refer to
several layers. No idea how to accommodate that with direct OGR write access
where each OGR layer needs its own name not number. Fetch layer name from
input vector, append layer number to map name if no layer name available?

I would suggest to use multiple 'olayer' parameter. If number of
produced layers would differ from the number of 'olayers', the module
ends up with an error and nothing is written to OGR datasource (?)

v.extract in=imap layer=1 where="..." out="PG:dbname=db" olayer=omap
format=PostgreSQL
v.extract in=imap layer=1 where="..." out=omap format=native

Should work, but preferably use input layer name if available?

Yes, if vector layer is defined - currently there are few modules
which write out layer name. I would suggest to modify all vector
modules to write layer name identical with input vector map name (if
more layers produced, then input_1, input_2). Do you agree with it?

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

Martin:

Markus:

[...]

yes, also olayer would be required. But it would make sense only for
OGR format, not the native format, am I right?

Thinking about it, there may be problems because some modules may produce
several output layers in the same output vector map if feature cats refer to
several layers. No idea how to accommodate that with direct OGR write access
where each OGR layer needs its own name not number. Fetch layer name from
input vector, append layer number to map name if no layer name available?
    
I would suggest to use multiple 'olayer' parameter. If number of
produced layers would differ from the number of 'olayers', the module
ends up with an error and nothing is written to OGR datasource (?)
  

But make olayer optional, if not given, come up with reasonable names based on input layer names/output map name?

  

v.extract in=imap layer=1 where="..." out="PG:dbname=db" olayer=omap
format=PostgreSQL
v.extract in=imap layer=1 where="..." out=omap format=native

Should work, but preferably use input layer name if available?
    
Yes, if vector layer is defined - currently there are few modules
which write out layer name.

Can be changed.

I would suggest to modify all vector
modules to write layer name identical with input vector map name (if
more layers produced, then input_1, input_2). Do you agree with it?
  

Not feeling too strongly about it, could also be output_1 etc. But v.support should definitively allow specifying/changing layer names for native format if the user is not happy with the default names. Changing layer names for OGR layers can be more difficult or impossible, depending on the format (nothing to change for e.g. GPX I think).

Coming back to the original idea of direct OGR write access, I think this would be great!

I see however a number of issues that need to be solved before getting to there: I can neither query nor update attributes of a shapefile linked in with v.external, it must be possible to specify layers by number or name for native vector maps, vector map layers linked with v.external and direct OGR access, direct OGR access can be triggered with virtual mapset OGR or new option Format (what are the (dis)advantages of this approach and your approach with option Format?) but for direct OGR write access an additional option 'format' or similar is needed specifying one of the locally supported OGR formats. I'm not an expert on OGR API, but updating an existing OGR layer may not be that easy? IMO minor issues are how to specify OGR dsn, OGR layer(s), and OGR supported format (for write access) in vector modules.

Markus M

Hi,

2009/10/16 Martin Landa <landa.martin@gmail.com>:

[...]

Maybe this is one step ahead, first fully implement direct OGR read access
without the need for v.external?

yes, I am working on it.

see

http://trac.osgeo.org/grass/wiki/Grass7/VectorLib#DirectOGRreadaccess

for info about current status.

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

It seems that GV_FORMAT_OGR refers now to both OGR layers linked with v.external and direct OGR access, but these two require different handling. Add new GV_FORMAT_OGR_DIRECT ? See temporary workaround in [1].

Markus M

[1] https://trac.osgeo.org/grass/changeset/39545

Hi,

2009/10/17 Markus Metz <markus.metz.giswork@googlemail.com>:

It seems that GV_FORMAT_OGR refers now to both OGR layers linked with
v.external and direct OGR access, but these two require different handling.
Add new GV_FORMAT_OGR_DIRECT ? See temporary workaround in [1].

I am not sure if it's really needed, direct read/write access is valid
only for vector maps from 'OGR' mapset.

Map->format == GV_FORMAT_OGR && strcasecmp(Map->mapset, "ogr")

But probably GV_FORMAT_OGR_DIRECT would be better solution.

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa

Martin Landa wrote:

Hi,

2009/10/17 Markus Metz <markus.metz.giswork@googlemail.com>:

It seems that GV_FORMAT_OGR refers now to both OGR layers linked with
v.external and direct OGR access, but these two require different handling.
Add new GV_FORMAT_OGR_DIRECT ? See temporary workaround in [1].
    
I am not sure if it's really needed, direct read/write access is valid
only for vector maps from 'OGR' mapset.

Map->format == GV_FORMAT_OGR && strcasecmp(Map->mapset, "ogr")

But probably GV_FORMAT_OGR_DIRECT would be better solution.
  

Also to avoid rewriting existing code that uses GV_FORMAT_OGR for v.external linked vectors, see e.g. [1,2].

[1] https://trac.osgeo.org/grass/browser/grass/trunk/lib/vector/Vlib/open.c?rev=39545#L327
[2] https://trac.osgeo.org/grass/browser/grass/trunk/lib/vector/Vlib/open.c?rev=39545#L437

Markus M

2009/10/17 Markus Metz <markus.metz.giswork@googlemail.com>:

[...]

But probably GV_FORMAT_OGR_DIRECT would be better solution.

Also to avoid rewriting existing code that uses GV_FORMAT_OGR for v.external
linked vectors, see e.g. [1,2].

[1]
https://trac.osgeo.org/grass/browser/grass/trunk/lib/vector/Vlib/open.c?rev=39545#L327
[2]
https://trac.osgeo.org/grass/browser/grass/trunk/lib/vector/Vlib/open.c?rev=39545#L437

right - done in r39546.

Martin

--
Martin Landa <landa.martin gmail.com> * http://gama.fsv.cvut.cz/~landa