[GRASS-dev] [bug #5202] (grass) Bug with floating point maps in r.stats

this bug's URL: http://intevation.de/rt/webrt?serial_num=5202
-------------------------------------------------------------------------

Subject: Bug with floating point maps in r.stats

Platform: GNU/Linux/x86
grass obtained from: CVS
grass binary for platform: Compiled from Sources

Hi,
there is a bug in r.stats.
The cell counting seems to be wrong.

Using the test suite mapset i get the following result:

Mapset <testmapset> in Location <TestLocation> GRASS 6.3.cvs > g.region -p res=200

GRASS 6.3.cvs > r.mapcalc "stat_test=col()*1.0"

GRASS 6.3.cvs > r.stats -npc input=stat_test nsteps=10
100%
1-1.9 10 12.50%
1.9-2.8 10 12.50%
2.8-3.7 10 12.50%
3.7-4.6 10 12.50%
4.6-5.5 10 12.50%
5.5-6.4 10 12.50%
6.4-7.3 10 12.50%
7.3-8.2 10 12.50%
8.2-9.1 10 12.50%
9.1-10 10 12.50%

GRASS 6.3.cvs > r.stats -npc input=stat_test nsteps=2
100%
1-5.5 90 102.27%
5.5-10 10 11.36% <------------ ???
                                                                                                                                                                                                                                                                          GRASS 6.3.cvs > r.stats -npc input=stat_test nsteps=3
100%
1-4 50 57.47%
4-7 40 45.98%
7-10 10 11.49% <------------ ???

Best regards
Soeren

-------------------------------------------- Managed by Request Tracker

Request Tracker wrote:

this bug's URL: http://intevation.de/rt/webrt?serial_num=5202
-------------------------------------------------------------------------

Subject: Bug with floating point maps in r.stats

Platform: GNU/Linux/x86
grass obtained from: CVS
grass binary for platform: Compiled from Sources

Hi,
there is a bug in r.stats.
The cell counting seems to be wrong.

Using the test suite mapset i get the following result:

Mapset <testmapset> in Location <TestLocation> GRASS 6.3.cvs > g.region -p res=200

GRASS 6.3.cvs > r.mapcalc "stat_test=col()*1.0"

GRASS 6.3.cvs > r.stats -npc input=stat_test nsteps=10
100%
1-1.9 10 12.50%
1.9-2.8 10 12.50%
2.8-3.7 10 12.50%
3.7-4.6 10 12.50%
4.6-5.5 10 12.50%
5.5-6.4 10 12.50%
6.4-7.3 10 12.50%
7.3-8.2 10 12.50%
8.2-9.1 10 12.50%
9.1-10 10 12.50%

GRASS 6.3.cvs > r.stats -npc input=stat_test nsteps=2
100%
1-5.5 90 102.27%
5.5-10 10 11.36% <------------ ???
                                                                                                                                                                                                                                                                          GRASS 6.3.cvs > r.stats -npc input=stat_test nsteps=3
100%
1-4 50 57.47%
4-7 40 45.98%
7-10 10 11.49% <------------ ???

There appear to be several errors here, probably related to the fact
that the documentation for G_quant_* is at best vague, and at worst
gibberish.

E.g.:

int G_quant_add_rule(struct Quant *q, DCELL dmin, DCELL dmax, CELL cmin, CELL cmax)

Add the rule that the floating-point range [dmin,dmin] produces an
integer in the range [cmin,cmax] by linear interpolation.

Saying "by linear interpolation" doesn't really tell you very much
when the result is an integer. Examining the code for
G_quant_get_cell_value() reveals that it uses C truncation (round
towards zero). So if you have the rule:

  G_quant_add_rule(q, 1.0, 10.0, 1, 2)

then inputs in the range 1.0 <= x < 10.0 return 1, while an input of
exactly 10.0 returns 2. This explains why nsteps=2 produces a 90/10
split.

The obvious question is whether the problem is with libgis'
quantisation code or with the way that r.stats is using it. The
programmers' manual doesn't help, and there is no reliable way to tell
whether any modules rely upon the current behaviour (searching for
explicit references to "G_quant" doesn't help, because
G_get_raster_row() etc use it, and just about everything uses those
functions).

Realistically, I think that r.stats needs to avoid using G_quant_*
altogether, and implement its own quantisation. Relying upon libgis'
quantisation code is asking for trouble.

The bogus ranges and the totals exceeding 100% are separate bugs.

--
Glynn Clements <glynn@gclements.plus.com>