[GRASS-user] Varying results - v.select,v.distance,v.what

Good day!

I am executing IDENTITY_ANALYSIS, or point-in-polygon
analysis for a series of points and layers and I am troubled
by the variation in results I am seeing. I am inclined to think it
is something I am doing incorrectly but I haven't found my error.
v.select yields nothing, v.distance yields a non-null answer, and
v.what yields a non-null value that is different from the value
generated by v.distance.

--Workflow--
A LOCATION was created tailored to the specifics of the layer
I am processing. The layer itself was imported using:
v.external, followed by v.category, followed by v.db.connect.

1. v.select workflow:
g.region vect=EXT_CENSUS_CAT
v.proj single_point location=lat_lon mapset=PERMANENT --overwrite
v.db.addtable single_point --q
v.select ainput=EXT_CENSUS_CAT binput=single_point output=single_select --overwrite --q
v.db.addtable single_select --q
v.db.select -c single_select --q
/* nothing returned */

2. v.distance workflow:
g.region vect=EXT_CENSUS_CAT
v.proj single_point location=lat_lon mapset=PERMANENT --overwrite
v.db.addtable single_point --q
v.distance -p from=single_point to=EXT_CENSUS_CAT upload=cat column=match
  100%
from_cat|match
1|114255
v.distance complete.

3. v.what workflow:
g.region vect=EXT_CENSUS_CAT
v.proj single_point location=lat_lon mapset=PERMANENT --overwrite
v.db.addtable single_point --q
v.out.ascii single_point
/* Take output values from v.out.ascii as east_north parameter for v.what */
v.what EXT_CENSUS_CAT east_north=-88.2551479,41.9068764

East: 88:15:18.53244W
North: 41:54:24.75504N

Map: EXT_CENSUS_CAT
Mapset: PERMANENT
Type: Centroid
Id: 231449
Layer: 1
Category: 114263

Am I doing this incorrectly?
Should I expect the three methods to yield the same result?
Which method is suggested to be the most reliable?

Thank you.

KFW

At 12:15 PM 11/19/2008, Kevin Webb wrote:

--Workflow--
A LOCATION was created tailored to the specifics of the layer
I am processing. The layer itself was imported using:
v.external, followed by v.category, followed by v.db.connect.

Am I doing this incorrectly?
Should I expect the three methods to yield the same result?
Which method is suggested to be the most reliable?

Thank you.

KFW

I decided to run v.select, v.distance, and v.what comparisons on another
layer that I have imported using v.external. The 3 query techniques all
returned the same value which lead me to believe that it was something
I had done improperly on the import sequence for the layer that was giving
me erroneous results; investigation yields more questions.

--The layer for which v.select, v.distance and v.what results match--

GRASS 6.4.svn (PREC0101):~ > v.db.connect -p PREC0102_EXT
Vector map <PREC0102_EXT@PERMANENT> is connected by:
layer <1> table <PREC0102> in database </home/kfw4/dev/grass_db/climate/prec/PREC0102.SHP> through driver <ogr> with key <>
GRASS 6.4.svn (PREC0101):~ > v.info -c PREC0102_EXT
Displaying column types/names for database connection of layer 1:
INTEGER|ID
INTEGER|GRIDCODE
CHARACTER|INCHES

Supposedly, queries are NOT supposed to work on this layer because v.category
has not been run - there is no category (cat) field, yet v.select, v.distance, and v.what all
work and result values are the same.

Why do queries on this layer this work?

--The layer for which v.select, v.distance, and v.what results do NOT match--

GRASS 6.4.svn (CENSUS_ESRI_04):~ > v.db.connect -p EXT_CENSUS_CAT
Vector map <EXT_CENSUS_CAT@PERMANENT> is connected by:
layer <1> table <EXT_CENSUS_CAT> in database </home/kfw4/dev/grass_db/CENSUS_ESRI_04/PERMANENT/dbf/> through driver <dbf> with key <cat>
layer <2> table <blkgrps01> in database </home/kfw4/dev/grass_db/esri_census/blkgrps01.dbf> through driver <ogr> with key <cat>
GRASS 6.4.svn (CENSUS_ESRI_04):~ > v.info -c EXT_CENSUS_CAT layer=1
Displaying column types/names for database connection of layer 1:
INTEGER|cat
GRASS 6.4.svn (CENSUS_ESRI_04):~ > v.info -c EXT_CENSUS_CAT layer=2
Displaying column types/names for database connection of layer 2:
DOUBLE PRECISION|SQMI
CHARACTER|STATE_FIPS
CHARACTER|CNTY_FIPS
CHARACTER|STCOFIPS
CHARACTER|TRACT
CHARACTER|BLKGRP
CHARACTER|FIPS
INTEGER|POP2000
DOUBLE PRECISION|POP00_SQMI
INTEGER|POP2003
INTEGER|WHITE
INTEGER|BLACK
INTEGER|AMERI_ES
INTEGER|ASIAN
INTEGER|HAWN_PI
INTEGER|OTHER
INTEGER|MULT_RACE
INTEGER|HISPANIC
INTEGER|MALES
INTEGER|FEMALES
INTEGER|AGE_UNDER5
INTEGER|AGE_5_17
INTEGER|AGE_18_21
INTEGER|AGE_22_29
INTEGER|AGE_30_39
INTEGER|AGE_40_49
INTEGER|AGE_50_64
INTEGER|AGE_65_UP
DOUBLE PRECISION|MED_AGE
DOUBLE PRECISION|MED_AGE_M
DOUBLE PRECISION|MED_AGE_F
INTEGER|HOUSEHOLDS
DOUBLE PRECISION|AVE_HH_SZ
INTEGER|HSEHLD_1_M
INTEGER|HSEHLD_1_F
INTEGER|MARHH_CHD
INTEGER|MARHH_NO_C
INTEGER|MHH_CHILD
INTEGER|FHH_CHILD
INTEGER|FAMILIES
DOUBLE PRECISION|AVE_FAM_SZ
INTEGER|HSE_UNITS
INTEGER|VACANT
INTEGER|OWNER_OCC
INTEGER|RENTER_OCC

For this layer, point-in-polygon queries are executed on layer#1, and attributes are queried from layer#2;
v.select, v.distance, and v.what all return different cat values.

v.db.connect -p indicates layer#2 is keyed by the <cat> field yet I don't see a 'cat' field in the v.info output,
shouldn't there be a 'cat' column in layer#2 or am I misinterpreting how multiple layers are properly added to
a vector?

Corrections/suggestions?

Thank you.

KFW

On Thu, Nov 20, 2008 at 9:03 PM, Kevin Webb <kfw4@cornell.edu> wrote:
...

v.db.connect -p indicates layer#2 is keyed by the <cat> field yet I don't
see a 'cat' field in the v.info output,
shouldn't there be a 'cat' column in layer#2 or am I misinterpreting how
multiple layers are properly added to a vector?

Yes, a cat column should always be there AFAIK.

Can you put the packages location somewhere with copy.paste commands
to try? Offlist, if needed for privacy.

Markus

At 03:56 PM 11/26/2008, Markus Neteler wrote:

On Thu, Nov 20, 2008 at 9:03 PM, Kevin Webb <kfw4@cornell.edu> wrote:
...
> v.db.connect -p indicates layer#2 is keyed by the <cat> field yet I don't
> see a 'cat' field in the v.info output,
> shouldn't there be a 'cat' column in layer#2 or am I misinterpreting how
> multiple layers are properly added to a vector?

Yes, a cat column should always be there AFAIK.

Can you put the packages location somewhere with copy.paste commands
to try? Offlist, if needed for privacy.

Markus

Thank you for your reply Markus.

I can copy.paste the vector from inside grass, or I can create a tar-ball of the
original source data for testing. Let me know your preference.

I suspect the phenomenon I am experiencing has something to do with using
v.external to do the import and subsequent GRASS interaction with the OGR driver,
although there is a strong possibility it is due to the fact that I am new
to GRASS and GIS in general and I have done something improperly. As time
allows, I am attempting to step through the v.external code using gdb to see if I
can make any discoveries.

--Background Information--
My primary interest in GRASS is to use its command-line/scripting interface for
bulk data extraction - data will be extracted from approximately 1,400 layers to be
used for multivariate statistical modelling of bird abundance and environmental
issues.

You might wonder why I am using v.external as opposed to v.in.ogr to import data
into GRASS. I have executed comparison tests using 65,000 points on layers imported
using v.in.ogr and v.external, and the results from the v.external vectors more closely
match results from the ESRI arcgisscripting interface than do the results from the v.in.ogr
vectors. The ESRI results are NOT any kind of benchmark, but they give me values I can
use for comparison. v.external point-in-polygon comparisons give me a .005 variation
when compared to the same run from ESRI, and v.in.ogr point-in-polygon comparisons
yield a variation > .01. I am biased towards using v.external as the import mechanism
based on the smaller delta.

Your assistance is greatly appreciated.

KFW

-----------------------------------------
Kevin Webb
Programmer/Analyst
Cornell Lab of Ornithology
159 Sapsucker Woods Rd.
Ithaca, NY 14850
Tel: 607.254.2103
Fax: 607.254.2415
kfw4@cornell.edu
-------------------------------------------

At 03:03 PM 11/20/2008, Kevin Webb wrote:

At 12:15 PM 11/19/2008, Kevin Webb wrote:

--Workflow--
A LOCATION was created tailored to the specifics of the layer
I am processing. The layer itself was imported using:
v.external, followed by v.category, followed by v.db.connect.

Am I doing this incorrectly?
Should I expect the three methods to yield the same result?
Which method is suggested to be the most reliable?

Thank you.

KFW

I decided to run v.select, v.distance, and v.what comparisons on another
layer that I have imported using v.external. The 3 query techniques all
returned the same value which lead me to believe that it was something
I had done improperly on the import sequence for the layer that was giving
me erroneous results; investigation yields more questions.

--The layer for which v.select, v.distance and v.what results match--

GRASS 6.4.svn (PREC0101):~ > v.db.connect -p PREC0102_EXT
Vector map <PREC0102_EXT@PERMANENT> is connected by:
layer <1> table <PREC0102> in database </home/kfw4/dev/grass_db/climate/prec/PREC0102.SHP> through driver <ogr> with key <>
GRASS 6.4.svn (PREC0101):~ > v.info -c PREC0102_EXT
Displaying column types/names for database connection of layer 1:
INTEGER|ID
INTEGER|GRIDCODE
CHARACTER|INCHES

Supposedly, queries are NOT supposed to work on this layer because v.category
has not been run - there is no category (cat) field, yet v.select, v.distance, and v.what all
work and result values are the same.

Why do queries on this layer this work?

Classes in the OGR library have member objects and methods for accessing the
FID attribute (Feature ID is a key field in shapefile attribute tables and is analogous
to the CAT attribute in Grass). Grass code that references the fid works correctly
on v.db.select operations without having to execute v.category to prepare a vector
that was imported using v.external.

g.region vect=EXT_CENSUS
v.proj single_point location=lat_lon mapset=PERMANENT --overwrite
v.db.addtable single_point --q
v.select ainput=EXT_CENSUS binput=single_point output=single_select --overwrite --q
v.db.addtable single_select --q
fid=`v.db.select -c single_select --q`

if [ -n "$fid" ]; then
    result=`v.db.select -c EXT_CENSUS column=POP00_SQMI where=fid=$fid --q`
fi

--The layer for which v.select, v.distance, and v.what results do NOT match--

I think the issue with varying results for v.select, v.distance , and v.what as I
have experienced are due to an OGR 'relationship with' the GEOS library. Tests I
have run suggest the GEOS library should be an installation requirement/prerequisite
for any user intending on executing operations on vectors imported using v.external.
(The GEOS library may be an undocumented requirement for all OGR ops.)

Note: I used the phrase "relationship with'" instead of "dependency". OGR will certainly
compile without GEOS, but OGR operations without GEOS may yield suspect results.

This discovery came about when I was testing code that calls the OGR library directly.
Calls to OGRGeometry::Distance(OGRGeometry *) errored out because GEOS was not
installed. After installing GEOS and recompiling GDAL, OGRGeometry::Distance() works
and I noticed changes in values for my test points previously run with OGRGeometry::Intersects().

Thinking that GEOS may have an impact on Grass, I executed similar before-and-after-GEOS
tests in Grass and observed the same improvement; points that resulted in multiple FIDs
coming from v.select before-GEOS, yielded a single value after-GEOS. v.select and v.distance
return the same value now that GEOS has been installed.

KFW

On Mon, Dec 15, 2008 at 9:07 PM, Kevin Webb <kfw4@cornell.edu> wrote:

At 03:03 PM 11/20/2008, Kevin Webb wrote:

...

--The layer for which v.select, v.distance, and v.what results do NOT
match--

I think the issue with varying results for v.select, v.distance , and v.what as I
have experienced are due to an OGR 'relationship with' the GEOS library.
Tests I have run suggest the GEOS library should be an installation
requirement/prerequisite
for any user intending on executing operations on vectors imported using
v.external.
(The GEOS library may be an undocumented requirement for all OGR ops.)

Note: I used the phrase "relationship with'" instead of "dependency". OGR
will certainly
compile without GEOS, but OGR operations without GEOS may yield suspect
results.

Isn't it then an OGR problem rather than a GRASS problem?
We use the OGR and expect that OGR always delivers the same
result (or issues a warning/error if not possible in absence of GEOS).

This discovery came about when I was testing code that calls the OGR library
directly.
Calls to OGRGeometry::Distance(OGRGeometry *) errored out because GEOS was
not installed. After installing GEOS and recompiling GDAL,
OGRGeometry::Distance() works
and I noticed changes in values for my test points previously run with
OGRGeometry::Intersects().

Thinking that GEOS may have an impact on Grass, I executed similar
before-and-after-GEOS
tests in Grass and observed the same improvement; points that resulted in
multiple FIDs
coming from v.select before-GEOS, yielded a single value after-GEOS.
v.select and v.distance
return the same value now that GEOS has been installed.

May I ask you to bring this up on the gdal mailing list?

Markus

At 12:44 PM 12/18/2008, Markus Neteler wrote:

On Mon, Dec 15, 2008 at 9:07 PM, Kevin Webb <kfw4@cornell.edu> wrote:
> At 03:03 PM 11/20/2008, Kevin Webb wrote:
>
> Note: I used the phrase "relationship with'" instead of "dependency". OGR
> will certainly
> compile without GEOS, but OGR operations without GEOS may yield suspect
> results.

Isn't it then an OGR problem rather than a GRASS problem?
We use the OGR and expect that OGR always delivers the same
result (or issues a warning/error if not possible in absence of GEOS).

If the assumptions and the requirements of the OGR API are as you
have stated here, then this is an issue with OGR and not Grass.

>
> Thinking that GEOS may have an impact on Grass, I executed similar
> before-and-after-GEOS
> tests in Grass and observed the same improvement; points that resulted in
> multiple FIDs
> coming from v.select before-GEOS, yielded a single value after-GEOS.
> v.select and v.distance
> return the same value now that GEOS has been installed.

May I ask you to bring this up on the gdal mailing list?

Markus

I am willing to bring this up on the gdal mailing list. Can anyone else
support my observation from their own testing (a simple 'yea' or 'nea' will suffice)?

KFW