[GRASS-dev] [GRASS GIS] #4009: 'where' option in v.to.rast can select wrong feature for raster attribute, in areas where features overlap

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
---------------------------------+---------------------------------
Reporter: florisvdh | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone:
Component: Vector | Version: svn-releasebranch76
Keywords: v.to.rast sql where | CPU: x86-64
Platform: Linux |
---------------------------------+---------------------------------
I use GRASS 7.6.1 on Linux Mint 18.1, i.e. Ubuntu Xenial (16.04) based.
''(Note, this is my first post here, I'm a beginning GRASS user, mostly
working with R)''

I applied something like:

{{{
v.to.rast input=polygon_layer output=output_x1 where="field1 LIKE 'x1'" \
          use=attr attribute_column=field2 memory=800 --overwrite
}}}

Importantly, {{{field1}}} in {{{polygon_layer}}} has several possible
values such as {{{x1}}}, {{{x2}}}, {{{x3}}} and so on (81 different values
in my usecase).

Moreover, in my usecase several features (polygons) have identical
geometry, i.e. many sets of 2 or more polygons exist with 100% overlap
among their own polygons (i.e. identical polygons), while each of these
polygons has its own specific attributes: different values of {{{field1}}}
and so on. The problem that I met occurs for those features; possibly the
same problem will occur for overlapping areas between polygons in general.

While the {{{where}}} option in {{{v.to.rast}}} is effective in
''localizing the correct areas'', i.e. where {{{field1}}} is {{{x1}}}, the
**problem** is that the **attribute value** ({{{field2}}}) may come from
one of the other (overlapping) features at that place, e.g. where
{{{field1}}} is {{{x2}}}. Which feature of an overlapping set is used for
the attribute probably depends on the order of those overlapping features.

From what I've seen, it appears that {{{v.to.rast}}} will select the
{{{field2}}} value **from the same feature regardless** of the value for
{{{field1}}} in the above {{{where}}} option, as long as ''one'' of those
overlapping features meets the {{{where}}} condition.

If this effectively ''is'' a bug in the program, maybe it happens in other
modules as well, where the {{{where}}} option is available.

''Note: I should also mention that I used this in a simple parallelization
approach in a loop (setting {{{field1}}} to different values), by starting
the commands as background processes until a certain number of them is
running (following an example I
[https://grasswiki.osgeo.org/wiki/Parallelizing_Scripts found] in the
GRASS wiki). That's why the memory option was used explicitly.''

Currently I worked around this problem by splitting {{{polygon_layer}}}
according to the value of {{{field1}}}, using {{{v.extract}}} (using just
a simple loop here as the simple parallelization doesn't work with this
one).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by mlennert):

Replying to [ticket:4009 florisvdh]:
> I use GRASS 7.6.1 on Linux Mint 18.1, i.e. Ubuntu Xenial (16.04) based.
''(Note, this is my first post here, I'm a beginning GRASS user, mostly
working with R)''
>
> I applied something like:
>
> {{{
> v.to.rast input=polygon_layer output=output_x1 where="field1 LIKE 'x1'"
\
> use=attr attribute_column=field2 memory=800 --overwrite
> }}}
>
> Importantly, {{{field1}}} in {{{polygon_layer}}} has several possible
values such as {{{x1}}}, {{{x2}}}, {{{x3}}} and so on (81 different values
in my usecase).
>
> Moreover, in my usecase several features (polygons) have identical
geometry, i.e. many sets of 2 or more polygons exist with 100% overlap
among their own polygons (i.e. identical polygons), while each of these
polygons has its own specific attributes: different values of {{{field1}}}
and so on. The problem that I met occurs for those features; possibly the
same problem will occur for overlapping areas between polygons in general.
>
> While the {{{where}}} option in {{{v.to.rast}}} is effective in
''localizing the correct areas'', i.e. where {{{field1}}} is {{{x1}}}, the
**problem** is that the **attribute value** ({{{field2}}}) may come from
one of the other (overlapping) features at that place, e.g. where
{{{field1}}} is {{{x2}}}. Which feature of an overlapping set is used for
the attribute probably depends on the order of those overlapping features.
>
> From what I've seen, it appears that {{{v.to.rast}}} will select the
{{{field2}}} value **from the same feature regardless** of the value for
{{{field1}}} in the above {{{where}}} option, as long as ''one'' of those
overlapping features meets the {{{where}}} condition.
>
> If this effectively ''is'' a bug in the program, maybe it happens in
other modules as well, where the {{{where}}} option is available.

This definitely should not happen. Could you provide the output of:

{{{
v.db.select map=polygon_layer where="field1 LIKE 'x1'" \
           column=cat,field1,field2
}}}

?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:1&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone:
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by florisvdh):

Below are a few characteristics of the map. Note, the actual values of
{{{field 1}}} are not 'x1', 'x2' etc. but specific codes. Given the size
of the data, I truncated the output.

{{{
$ v.info polygon_layer -c
Displaying column types/names for database connection of layer <1>:
INTEGER|cat
TEXT|field1
DOUBLE PRECISION|field2

$ v.db.select map=polygon_layer | wc -l
85051

$ v.db.select map=polygon_layer where="field1 LIKE 'rbbhc'" | wc -l
4720

$ v.db.select map=polygon_layer where="field1 LIKE 'rbbhc'" | head
cat|field1|field2
132|rbbhc|20
146|rbbhc|70
151|rbbhc|30
152|rbbhc|30
153|rbbhc|30
154|rbbhc|30
155|rbbhc|30
156|rbbhc|30
157|rbbhc|30
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:2&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.8.3
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by mmetz):

I think I have a fix for this bug, but I would like to test the fix first
with your data. Can you provide a small spatial extract of your data that
can be used to reproduce the problem? Thanks!

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:4&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.8.3
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------
Changes (by florisvdh):

* Attachment "polygon_layer_debug.gpkg" added.

Clipped the original polygon_layer for a small region, for debugging

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.8.3
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by florisvdh):

**@mmetz**: see the attachment.

A few examples of sets of polygons which have the same polygon but
different attributes:

- cat 21936 and 21937
- cat 21829 and 21830
- cat 22114 and 22115

The original data set from which it was derived (involved more
[https://github.com/inbo/n2khab-mne-
design/tree/bd4819099f149ab71346deca1727f5942a88e6c3/030_preparations/010_explore_targetpop
steps] apart from clipping), is at https://doi.org/10.5281/zenodo.3540740
.

Some additional output to show region and a few steps:

{{{
$ g.region -p
projection: 99 (Belge 1972 / Belgian Lambert 72)
zone: 0
datum: bel72
ellipsoid: international
north: 244030.1
south: 153054.1
west: 22029.6
east: 258861.6
nsres: 32
ewres: 32
rows: 2843
cols: 7401
cells: 21041043

$ g.region w=196065 e=199009 s=188877 n=190989 save=debug

$ g.region -p
projection: 99 (Belge 1972 / Belgian Lambert 72)
zone: 0
datum: bel72
ellipsoid: international
north: 190989
south: 188877
west: 196065
east: 199009
nsres: 32
ewres: 32
rows: 66
cols: 92
cells: 6072

$ v.clip -r input=polygon_layer output=polygon_layer_debug

$ v.out.ogr input=polygon_layer_debug output=polygon_layer_debug.gpkg

}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:5&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.8.3
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by mmetz):

Fixed in master
[https://github.com/OSGeo/grass/commit/b4f79f2f8225ec5baaa5187a87e1ac0cb868e549
b4f79f2] and relbr78
[https://github.com/OSGeo/grass/commit/48e807ea5d763faf4377630c990c32b6c013ac08
48e807e]

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:6&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.8.3
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by florisvdh):

Interesting, thanks!

I wonder how this relates to [https://trac.osgeo.org/grass/ticket/1798
ticket 1798], which initiated the 'where' option in quite a number of
GRASS modules. Hence, are other modules affected as well?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:7&gt;
GRASS GIS <https://grass.osgeo.org>

#4009: 'where' option in v.to.rast can select wrong feature for raster attribute,
in areas where features overlap
------------------------+---------------------------------
  Reporter: florisvdh | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.8.3
Component: Vector | Version: svn-releasebranch76
Resolution: | Keywords: v.to.rast sql where
       CPU: x86-64 | Platform: Linux
------------------------+---------------------------------

Comment (by mmetz):

Replying to [comment:7 florisvdh]:
> Interesting, thanks!
>
> I wonder how this relates to [https://trac.osgeo.org/grass/ticket/1798
ticket 1798], which initiated the 'where' option in quite a number of
GRASS modules. Hence, are other modules affected as well?

All modules using Vect_cats_in_constraint() need to be checked, I am on
it.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/4009#comment:8&gt;
GRASS GIS <https://grass.osgeo.org>