[GRASS-user] sql join in v.extract

Hi,

I have a large vector shapefile, 'UTM_1KM_GRID', and a separate, large
sql table, "EOs". The 'ATLAS_83' column in 'UTM_1KM_GRID' contains a
unique name for each polygon in the shapefile. The 'SQUARE1KM' column in
"EOs" contains the same names, such that the tables can be joined "1 to
many": UTM_1KM_GRID ATLAS_83 to EOs SQUARE1KM.

I can explicitly join the two tables with db.select:

db.select sql="SELECT cat from UTM_1KM_GRID as U join EOs as E on
U.ATLAS_83 = E.SQUARE1KM where SCI_NAME = 'Carex folliculata'"

Result:
cat
391598
77652
...

i.e., select all the 'cat' values from UTM_1KM_GRID corresponding to the
indicated SCI_NAME value in EOs, based on the join I described above.

Now I would like to do the same thing in the context of v.extract, so
that I can create a new shapefile that is a subset of the original
shapefile, defined by the same sql statement. However, v.extract only
accepts a 'where' argument, not a 'sql' statement, so I'm not sure how
to indicate this.

I have tried v.db.join, which appears to allow me to combine the columns
from EOs with the table for the shapefile. That may work, but both
tables are quite large, and it hasn't finished after running overnight.
I'm not sure if the resulting joined table will be too big to work with
on my laptop. In contrast, the db.select takes only a few seconds.

My questions:

1. Is there a way to use a complex sql statement (i.e., that includes a
join directive) in v.extract?

2. If not, is the most sensible approach here to collect the output of
db.select in a python script and use that to generate the corresponding
where directive? i.e., where='(cat = 391598) OR (cat=77652) OR ...'?

Best,

Tyler

--
plantarum.ca

To answer my own questions:

1. Is there a way to use a complex sql statement (i.e., that includes a
join directive) in v.extract?

Maybe not?

2. If not, is the most sensible approach here to collect the output of
db.select in a python script and use that to generate the corresponding
where directive? i.e., where='(cat = 391598) OR (cat=77652) OR ...'?

Not sure if this is the best approach, but as a Python newbie it turned
out to be pretty simple, especially after I realized sqlite has an 'in'
operator:

  import grass.script as gscript
  gscript.run_command('g.region', vector='UTM_1KM_GRID')

  ## collect the keys from the sql table:
  grids = gscript.parse_command('db.select',
                              sql=
                              "SELECT SQUARE1KM FROM EOs WHERE SCI_NAME
                              = 'Carex folliculata'").keys()

  ## insert the keys into the query string for the shapefile:
  gscript.run_command('v.extract', input='UTM_1KM_GRID',
                    output='c_folliculata',
                    where='ATLAS_83 in (\'%s\')' % '\', \''.join(grids))

Best,

Tyler