[GRASS-dev] [GRASS GIS] #475: r.stats: last bin always has a single cell

#475: r.stats: last bin always has a single cell
----------------------+-----------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: critical | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Keywords: r.stats | Platform: All
      Cpu: All |
----------------------+-----------------------------------------------------
Hi, r.stats seems to be doing something funny: the last bin always
contains only one cell. Same result with GRASS 5.4.1 - 6.5svn on
Linux, OSX, Solaris.

speafish:
{{{
G65> g.region rast=elevation.10m
G65> r.stats elevation.10m nsteps=10 -c | nl
  100%
      1 1061.064087-1139.632019 243490
      2 1139.632019-1218.199951 732066
      3 1218.199951-1296.767883 399213
      4 1296.767883-1375.335815 351601
      5 1375.335815-1453.903748 315246
      6 1453.903748-1532.47168 265487
      7 1532.47168-1611.039612 216796
      8 1611.039612-1689.607544 115175
      9 1689.607544-1768.175476 15727
     10 1768.175476-1846.743408 1

G65> r.mapcalc 'bin10 = if(elevation.10m > 1768.175476, 1, null() )'
G65> r.univar bin10 | grep '^n'
n: 12344
}}}

other bin counts don't line up either:
{{{
G65> r.mapcalc 'bin2 = if(elevation.10m > 1139.632019 && elevation.10m <
1218.199951, 1, null() )'
G65> r.univar bin2 | grep '^n'
n: 645823

G65> r.mapcalc 'bin9 = if(elevation.10m > 1689.607544 && elevation.10m <
1768.175476, 1, null() )'
G65> r.univar bin9 | grep '^n'
n: 87599
}}}

all-in-one bin gives correct result:
{{{
r.stats elevation.10m nsteps=1 -c
  100%
1061.064087-1846.743408 2654802

r.univar elevation.10m | grep '^n'
n: 2654802

# but,
r.stats elevation.10m nsteps=2 -c
  100%
1061.064087-1453.903748 2654801
1453.903748-1846.743408 1
}}}

sum total seems to be ok, just distribution does not match reported range:
{{{
r.stats elevation.10m nsteps=5 -c
  100%
1061.064087-1218.199951 1090464
1218.199951-1375.335815 817890
1375.335815-1532.47168 573165
1532.47168-1689.607544 173282
1689.607544-1846.743408 1

echo $((1090464 + 817890 + 573165 + 173282 + 1))
2654802
}}}

thanks to Doc Robinson for reporting this.

(I have a vague memory that this was also reported years ago in the
mailing list archives??)

thanks,
Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/475&gt;
GRASS GIS <http://grass.osgeo.org>

#475: r.stats: last bin always has a single cell
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.stats
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Comment (by hamish):

Replying to [ticket:475 hamish]:
> Hi, r.stats seems to be doing something funny: the last bin
> always contains only one cell. Same result with GRASS 5.4.1 -
> 6.5svn on Linux, OSX, Solaris.

apparently in raster/r.stats/stats.c update_cell_stats(), but then I get
lost as I'm not really trained to deal with hash tables.

Interestingly if I change the resolution to misalign I only get the first
output step:
{{{
G65> g.region rast=elevation.dem
G65> r.stats elevation.10m nsteps=2 -c
  100%
1061.064087-1453.903748 294978
}}}

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/475#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#475: r.stats: last bin always has a single cell
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.stats
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Comment (by hamish):

I see the same bug in 5.0.0, so it probably dates back to the initial FP
conversion.

H

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/475#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#475: r.stats: last bin always has a single cell
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.stats
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Comment (by hamish):

request for help solving this critical bug: we are reporting bad
statistical data from our core stats module. It is used by r.report,
d.histogram, etc and on to users' work output.

I'm horrified by this bug but unfamiliar with hash tables and don't know
how to proceed with debugging.

thanks,
Hamish

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/475#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#475: r.stats: last bin always has a single cell
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.stats
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Comment (by glynn):

Replying to [comment:3 hamish]:
> request for help solving this critical bug: we are reporting bad
statistical data from our core stats module. It is used by r.report,
d.histogram, etc and on to users' work output.
>
> I'm horrified by this bug but unfamiliar with hash tables and don't know
how to proceed with debugging.

I think that you're looking in the wrong place.

This appears to be a problem with the quantisation semantics.

In your test case, r.stats calls
{{{
G_quant_add_rule(&q, 1061.064087, 1846.743408, 1, 10)
}}}

It then assigns the quantisation rules to the input map.

Cell values are obtained using G_get_c_raster_row(), which converts the
map's FP values to integer CELL values via G_quant_get_cell_value(). Given
the above quantisation rule, it converts FP values as follows:
{{{
> print G_quant_get_cell_value(&q, 1846.743408)
$15 = 10
> print G_quant_get_cell_value(&q, 1846.743407)
$17 = 9
> print G_quant_get_cell_value(&q, 1846.743409)
$18 = -2147483648
}}}

Essentially, G_quant_add_rule() treats the ranges as half-open, i.e. the
values range from low (inclusive) to high (exclusive). While half-open
ranges are a common concept (e.g. floor() behaves the same way), the range
of a GRASS raster is closed, i.e. both the low and high values are
inclusive.

I suggest changing r.stats to use cHigh=nsteps+1, and truncate the
quantised values into range (fudging dHigh may seem simpler, but the
maximum value may be positive, negative or zero, and the range may be
1e+300 or 1e-300, so choosing a suitable fudge factor isn't
straightforward).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/475#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#475: r.stats: last bin always has a single cell
-----------------------+----------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: critical | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.stats
  Platform: All | Cpu: All
-----------------------+----------------------------------------------------
Comment (by hamish):

I've applied a fix for this bug in devbr6 r37699, and trunk r37700.

please review/test.

One thing I notice is that for multiple input maps of mixed types the
numbers for the * second map's ranges are a bit weird + change with the
patch.

e.g.:

{{{
#spearfish
# elevation.dem is CELL, elevation.10m is DCELL
g.region rast=elevation.10m
g.region n=n+10

# same before and after patch
G65> r.stats elevation.dem nsteps=5 -c | tail -n 11
  100%
1831 108
1832 117
1833 108
1834 90
1835 72
1836 63
1837 36
1838 9
1839 27
1840 36
* 25848
}}}

{{{
# before patch
G65> r.stats elevation.10m nsteps=5 -c
  100%
1061.064087-1218.199951 1090464
1218.199951-1375.335815 817890
1375.335815-1532.47168 573165
1532.47168-1689.607544 173282
1689.607544-1846.743408 1

G65> r.stats elevation.dem,elevation.10m nsteps=5 -c | tail -n 11
  100%
1836 1532.47168-1689.607544 63
1837 1532.47168-1689.607544 35
1837 1689.607544-1846.743408 1
1838 1532.47168-1689.607544 9
1839 1532.47168-1689.607544 27
1840 1532.47168-1689.607544 36
* 1061.064087-1218.199951 10968
* 1218.199951-1375.335815 2496
* 1375.335815-1532.47168 10395
* 1532.47168-1689.607544 90
* * 1899
}}}

note 2x cat 1837 in above..(?) [they add up ok]

{{{
# after patch
G65> r.stats elevation.10m nsteps=5 -c
  100%
1061.064087-1218.199951 847076
1218.199951-1375.335815 730296
1375.335815-1532.47168 565685
1532.47168-1689.607544 411802
1689.607544-1846.743408 99943
* 1899

G65> r.stats elevation.dem,elevation.10m nsteps=5 -c | tail -n 11
  100%
1835 1689.607544-1846.743408 72
1836 1689.607544-1846.743408 63
1837 1689.607544-1846.743408 36
1838 1689.607544-1846.743408 9
1839 1689.607544-1846.743408 27
1840 1689.607544-1846.743408 36
* 1061.064087-1218.199951 10090
* 1218.199951-1375.335815 881
* 1375.335815-1532.47168 8634
* 1532.47168-1689.607544 4344
* * 1899
}}}

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/475#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#475: r.stats: last bin always has a single cell
---------------------+------------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.stats
  Platform: All | Cpu: All
---------------------+------------------------------------------------------
Changes (by hamish):

  * priority: critical => major

Comment:

downgrading severity because the main reported bug is fixed.

the weird output for multiple input maps of mixed types still remains so
not closing it.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/475#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>