[GRASS-dev] [GRASS GIS] #3184: v.ect.stats: errors in counts and statistics

#3184: v.ect.stats: errors in counts and statistics
--------------------------+-------------------------
Reporter: veroandreo | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Keywords: v.vect.stats | CPU: x86-64
Platform: Linux |
--------------------------+-------------------------
I am working with cases of a disease in Greece. So, I have points
(coordinates) and I need to get a count of those cases per municipality
and higher administrative units, i.e: NUTS. However, v.vect.stats gives
wrong counts. In, my case, I should have 615 cases and if I sum the cases
counted by v.vect.stats (using v.db.select mymap column=cases), I get 767.
In some polygons, it counts ok, in some it misses points, in others it
"invents" extra points.

I guess the whole problem is maybe related to the fact that Greece has so
many islands split in a much lower number of categories (i.e.: belonging
to a reduced number of administrative units), but I don't know.

Anyway, I attach a vector of polygons and a vector of points as an
example. They were created in NC demo location. Just unpack and run:

{{{
v.vect.stats -p points=points_test areas=areas_test method=average
points_column=value count_column=count stats_column=average
}}}

This is the output:

{{{
area_cat|count|average
1|3|5.5
2|0|null
3|6|11.3333333333333
3|0|null
6|2|10.5
}}}

Printing to stout gives two lines for cat 3, one with the right count, one
with 0. However, G7:v.db.select gives a different output:

{{{
v.db.select areas_test
cat|count|average
1|3|5.5
2|0|
3|0|
6|2|10.5
}}}

If you query the map from the GUI or display the attribute table, polygons
with cat 3 have 0 count (see attached screenshot).

Not only the count is wrong, but the average seems also weird. The three
points in the polygons with cat 1 have a value of 5, why would average be
5.5?

I'm running GRASS trunk r69699 under Fedora 24.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.ect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------
Changes (by veroandreo):

* Attachment "areas_test.pack" added.

vector of areas to perform count

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.ect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------
Changes (by veroandreo):

* Attachment "points_test.pack" added.

vector of points to perform counts and statistics

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.ect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------
Changes (by veroandreo):

* Attachment "v_vect_stats_errors.png" added.

screenshot showing areas and points

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.ect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mlennert):

Here's another example with internally generated data:

{{{
g.region res=1 n=3 s=0 w=0 e=3 -p
v.mkgrid grid=3,3 map=grid
v.extract grid type=centroid out=gc
v.category grid type=centroid out=grid_no_cats cat=-1 op=del
g.copy vect=grid_no_cats,grid_one_cat
v.edit grid_one_cat type=centroid tool=catadd cat=1 bbox=0,0,3,3

v.vect.stats -p points=gc type=centroid areas=grid points_col=col
countcol=count statscol=sum method=sum
Selecting points for each area...
  100%
area_cat|count|sum
1|1|1
2|1|2
3|1|3
4|1|1
5|1|2
6|1|3
7|1|1
8|1|2
9|1|3

v.vect.stats -p points=gc type=centroid areas=grid_one_cat points_col=col
countcol=count statscol=sum method=sum
Selecting points for each area...
  100%
area_cat|count|sum
1|0|null
1|0|null
1|9|18
1|0|null
1|0|null
}}}

IOW, with all grid cells having the same cat value, results become
nonsense except for one case. However, this one case is not the one
uploaded to the vector attribute table when one doesn't use the '-p' flag.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:1&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:2&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mlennert):

I think I found the issue.

When running with g.gisenv set=DEBUG=1, I get:

{{{
v.vect.stats -p points=gc type=centroid areas=grid_one_cat points_col=col
countcol=count statscol=sum method=sum
[...]
D1/1: 9 cats loaded from vector (including duplicates)
D1/1: 5 cats loaded from vector (unique)
[...]
}}}

There should be only 1 unique cat.

The attached patch seems to solve this for me, but it needs testing and
review.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:3&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------
Changes (by mlennert):

* Attachment "v_vect_stats_unique_cats.diff" added.

patch (against trunk) to correctly uniquify list of cats

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by veroandreo):

I applied the patch and tested.
- For my data (615 points, 3600 areas split in 325 categories), I got
Segmentation fault.
- For the data generated with the commands you just posted, the result
looks fine when running v.vect.stats with -p flag.

{{{
GRASS 7.3.svn (nc_spm_08_grass7):~ > v.vect.stats -p points=gc
type=centroid areas=grid_one_cat points_col=col countcol=count
statscol=sum method=sum
Selecting points for each area...
  100%
area_cat|count|sum
1|9|18
}}}

However, if you remove that flag, to actually update the attribute table
(which for this case it is supposed to have only one category), I get:

{{{
GRASS 7.3.svn (nc_spm_08_grass7):~ > v.db.select grid_one_cat
cat|row|col|rown|coln|count|sum
1|3|1|C|A|9|18
2|3|2|C|B||
3|3|3|C|C||
4|2|1|B|A||
5|2|2|B|B||
6|2|3|B|C||
7|1|1|A|A||
8|1|2|A|B||
9|1|3|A|C||
}}}

Is there something wrong with v.db.select or with v.edit? I followed all
your steps. But even if I do {{{v.db.select grid_one_cat}}} before
v.vect.stats (and after the v.edit step), I get all the old categories:

{{{
GRASS 7.3.svn (nc_spm_08_grass7):~ > v.db.select grid_one_cat
cat|row|col|rown|coln
1|3|1|C|A
2|3|2|C|B
3|3|3|C|C
4|2|1|B|A
5|2|2|B|B
6|2|3|B|C
7|1|1|A|A
8|1|2|A|B
9|1|3|A|C
}}}

I'm lost...

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:4&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mlennert):

Replying to [comment:4 veroandreo]:
> I applied the patch and tested.
> - For my data (615 points, 3600 areas split in 325 categories), I got
Segmentation fault.

Yes, i just noticed this for the original test data (test_areas,
test_points) as well...

> - For the data generated with the commands you just posted, the result
looks fine when running v.vect.stats with -p flag.
>
> {{{
> GRASS 7.3.svn (nc_spm_08_grass7):~ > v.vect.stats -p points=gc
type=centroid areas=grid_one_cat points_col=col countcol=count
statscol=sum method=sum
> Selecting points for each area...
> 100%
> area_cat|count|sum
> 1|9|18
> }}}
>
> However, if you remove that flag, to actually update the attribute table
(which for this case it is supposed to have only one category), I get:
>
> {{{
> GRASS 7.3.svn (nc_spm_08_grass7):~ > v.db.select grid_one_cat
> cat|row|col|rown|coln|count|sum
> 1|3|1|C|A|9|18
> 2|3|2|C|B||
> 3|3|3|C|C||
> 4|2|1|B|A||
> 5|2|2|B|B||
> 6|2|3|B|C||
> 7|1|1|A|A||
> 8|1|2|A|B||
> 9|1|3|A|C||
> }}}
>
> Is there something wrong with v.db.select or with v.edit? I followed all
your steps. But even if I do {{{v.db.select grid_one_cat}}} before
v.vect.stats (and after the v.edit step), I get all the old categories:
>
> {{{
> GRASS 7.3.svn (nc_spm_08_grass7):~ > v.db.select grid_one_cat
> cat|row|col|rown|coln
> 1|3|1|C|A
> 2|3|2|C|B
> 3|3|3|C|C
> 4|2|1|B|A
> 5|2|2|B|B
> 6|2|3|B|C
> 7|1|1|A|A
> 8|1|2|A|B
> 9|1|3|A|C
> }}}
>
> I'm lost...

This is normal: the table is not changed by the v.category commands, only
the cat values. So all the old entries are still there. To make it really
clean, one would have to run v.db.connect -d, db.droptable, v.db.addtable
after the v.category command.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:5&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mlennert):

Replying to [comment:5 mlennert]:
> Replying to [comment:4 veroandreo]:
> > I applied the patch and tested.
> > - For my data (615 points, 3600 areas split in 325 categories), I got
Segmentation fault.
>
> Yes, i just noticed this for the original test data (test_areas,
test_points) as well...

I can get rid of the segfault, but results are still not good. I have the
feeling the issue is more profound than what I initially thought, so it
would be better if MarkusM had a look at it.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:6&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------
Changes (by mlennert):

* Attachment "v_vect_stats_unique_cats.2.diff" added.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mlennert):

Ah, when those itches itch...

Please try the second patch I just uploaded. It works for my two test
cases.

I also modified the behavior concerning calculating averages. IIUC, the
idea is that if original values are integers, the resulting average should
also be integer. However, again IIUC, there was a floor() missing to
actually make it integer after the 0.5. So I added this floor().

Please test if the results seem ok. We can then discuss the +0.5 issue...

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:7&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by veroandreo):

When I apply this second patch (having the first already applied), I get:
{{{
[veroandreo@192 grass-7.3.svn]$ patch -p0 <
v_vect_stats_unique_cats.2.diff
patching file vector/v.vect.stats/main.c
Hunk #2 FAILED at 100.
Hunk #3 FAILED at 461.
Hunk #4 succeeded at 660 (offset 2 lines).
2 out of 4 hunks FAILED — saving rejects to file
vector/v.vect.stats/main.c.rej
}}}

If anyway I try to use v.vect.stats with the first test data (areas_test
and points_test), I get Segmentation fault. But maybe that's because the
patch is not fully/correctly applied? I'm using latest trunk revision
(r69713M).

ps: thanks for clarification regarding v.category, v.db.update and so on

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:8&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by neteler):

Replying to [comment:8 veroandreo]:
> When I apply this second patch (having the first already applied),

You likely need to revert the first patch if the second was generated
against SVN.

{{{
svn revert file(s)
# verify
svn status
}}}

Then apply the second alias new patch (I didn't try myself but this would
be the common procedure).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:9&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by veroandreo):

Ahhh! Thanks for the explanation :slight_smile:

So, I applied the second patch "instead" of the first one, and it works as
expected, at least for the counts of points in polygons. I tested both
with the test data previously attached and with data of my own. Thanks so
much! I will test more and report back :slight_smile:

Regarding the statistics... I'm not sure how to, but even if the attribute
data in the points is integer, wouldn't that be better to have the "real"
average, standard deviation, etc., and not a truncated version of them??

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:10&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mmetz):

Replying to [comment:10 veroandreo]:
> Ahhh! Thanks for the explanation :slight_smile:
>
> So, I applied the second patch "instead" of the first one, and it works
as expected, at least for the counts of points in polygons. I tested both
with the test data previously attached and with data of my own. Thanks so
much! I will test more and report back :slight_smile:

The patch to remove duplicate area categories seems correct to me, but
does not account for the case when no area categories are found. This test
is missing in the code anyway and should be done before removing duplicate
area categories.
>
> Regarding the statistics... I'm not sure how to, but even if the
attribute data in the points is integer, wouldn't that be better to have
the "real" average, standard deviation, etc., and not a truncated version
of them??

Right. The stats that can be integer are "category number of lowest
value", "category number of highest value", and "number of different
values", all else should be floating point. Count is treated differently
(not a method option but triggered with the count_column option). This
"half" addition of 0.5 comes from r.neighbors in G6 which is ignored by
r.neighbors in G7.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:11&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------
Changes (by mmetz):

* Attachment "v_vect_stats_unique_cats_no_half.diff" added.

patch against trunk to remove duplicate cats and to eliminate half

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by mmetz):

I have uploaded a patch (v_vect_stats_unique_cats_no_half.diff​) that
correctly removes duplicate cats (similar to
v_vect_stats_unique_cats.2.diff) and additionally eliminates the usage of
half (adding 0.5 to the result). This half makes only sense if the result
is to be converted to integer and if the result is a positive number. The
result is always floating point, therefore adding 0.5 causes wrong
results.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:12&gt;
GRASS GIS <https://grass.osgeo.org>

#3184: v.vect.stats: errors in counts and statistics
-------------------------+--------------------------
  Reporter: veroandreo | Owner: grass-dev@…
      Type: defect | Status: new
  Priority: normal | Milestone: 7.2.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords: v.vect.stats
       CPU: x86-64 | Platform: Linux
-------------------------+--------------------------

Comment (by veroandreo):

Replying to [comment:12 mmetz]:
> I have uploaded a patch (v_vect_stats_unique_cats_no_half.diff​) that
correctly removes duplicate cats (similar to
v_vect_stats_unique_cats.2.diff) and additionally eliminates the usage of
half (adding 0.5 to the result). This half makes only sense if the result
is to be converted to integer and if the result is a positive number. The
result is always floating point, therefore adding 0.5 causes wrong
results.

Thanks Markus! I applied the patch and tested with Moritz's test data and
examples, and also with my own data. Counts are fine and also stats :slight_smile:

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/3184#comment:13&gt;
GRASS GIS <https://grass.osgeo.org>