[GRASS-dev] [GRASS GIS] #474: r.quantile: segfaults with percentile=100

#474: r.quantile: segfaults with percentile=100
------------------------+---------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
     Type: defect | Status: new
Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Keywords: r.quantile | Platform: Linux
      Cpu: x86-32 |
------------------------+---------------------------------------------------
spearfish:
{{{
g.region rast=elevation.10m
r.quantile elevation.10m percentile=0,10,20,30,40,50,60,70,80,90,100
Computing histogram
  100%
Computing bins
Binning data
  100%
Sorting bins
  100%
Computing quantiles
0:0.000000:1061.064087
1:10.000000:1159.293774
2:20.000000:1184.320190
3:30.000000:1210.821362
4:40.000000:1251.937646
5:50.000000:1309.368164
6:60.000000:1378.708764
7:70.000000:1440.232398
8:80.000000:1525.849707
9:90.000000:1613.602832
Segmentation fault
}}}

gdb:
{{{

Program received signal SIGSEGV, Segmentation fault.
0x08049205 in compute_quantiles (recode=0) at main.c:209
209 v = (i0 == i1)
(gdb) bt
#0 0x08049205 in compute_quantiles (recode=0) at main.c:209
#1 0x0804975c in main (argc=0, argv=0x3ff00000) at main.c:326
(gdb) list
204
205 k = next - b->origin;
206 i0 = (int)floor(k);
207 i1 = (int)ceil(k);
208
209 v = (i0 == i1)
210 ? values[b->base + i0]
211 : values[b->base + i0] * (i1 - k) + values[b->base +
i1] * (k -
212
i0);
213
   (gdb) bt full
#0 0x08049205 in compute_quantiles (recode=0) at main.c:209
         k = 2654489
         i1 = 2654489
         next = 2654802
         v = 1613.6028318000001
         i0 = 2654489
         b = (struct bin *) 0x805e9e4
         prev_v = 1613.6028318000001
         quant = 10
#1 0x0804975c in main (argc=0, argv=0x3ff00000) at main.c:326
         module = (struct GModule *) 0xb7f3f170
         opt = {input = 0xb7f3f120, quant = 0x804b028, perc = 0x804b088,
   slots = 0x804b0e8}
         flag = {r = 0xb7f3f0fc}
         recode = 0
         infile = 6
         range = {min = 1061.064087, max = 1846.743408, first_time = 0}
}}}

presumably values[b->base + i0] is just outside the array.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/474&gt;
GRASS GIS <http://grass.osgeo.org>

#474: r.quantile: segfaults with percentile=100
---------------------+------------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+------------------------------------------------------
Comment (by glynn):

Replying to [ticket:474 hamish]:

> presumably values[b->base + i0] is just outside the array.

Actually, it's way outside the array, as the last bin never gets
initialised.

Hopefully fixed in r35846.

This also fixes a bug where the quantile corresponds to the last value in
the bin (e.g. if there is only one value in the bin). The interpolation
will use the first value from the bin for the next quantile, which could
be way off.

Ideally, it should retain the values from the following bin in case one is
needed for interpolation. In practice, this won't make much difference.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/474#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#474: r.quantile: segfaults with percentile=100
---------------------+------------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: new
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+------------------------------------------------------
Comment (by neteler):

Backported to 6.5.svn (r35847) and 6.4.0svn (r35848).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/474#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#474: r.quantile: segfaults with percentile=100
---------------------+------------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: closed
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+------------------------------------------------------
Changes (by hamish):

  * status: new => closed
  * resolution: => fixed

Comment:

thanks, works.

I notice it does not exactly match r.univar,

{{{
G65> r.quantile in=elevation.10m percentile=`seq -s, 0 10 100`
0:0.000000:1061.064087
1:10.000000:1153.037231
2:20.000000:1184.320190
3:30.000000:1210.821362
4:40.000000:1251.937646
5:50.000000:1309.368164
6:60.000000:1378.708764
7:70.000000:1440.232398
8:80.000000:1525.849707
9:90.000000:1613.602832
10:100.000000:1846.743408
}}}

{{{
G65> r.univar -ge elevation.10m percentile=`seq -s, 0 10 100`
percentile_0=1061.06
percentile_10=1153.04
percentile_20=1184.32
percentile_30=1210.82
percentile_40=1251.94
percentile_50=1309.37
percentile_60=1378.71
percentile_70=1440.23
percentile_80=1525.85
percentile_90=1613.6
percentile_100=1846.74
}}}

because r.univar's output has been rounded by %g at very few digits.. (see
trac #335). otherwise it matches fine.

closing bug.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/474#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#474: r.quantile: segfaults with percentile=100
---------------------+------------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: closed
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+------------------------------------------------------
Comment (by hamish):

Replying to [comment:1 glynn]:
> This also fixes a bug where the quantile corresponds to the
> last value in the bin (e.g. if there is only one value in the
> bin). The interpolation will use the first value from the bin
> for the next quantile, which could be way off.

I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'
gives no output. (all others do the same, report n-1)

Is this intended because 3 bins will have two separators (at 33% and 66%),
or is it a bug?

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/474#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#474: r.quantile: segfaults with percentile=100
---------------------+------------------------------------------------------
  Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: closed
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+------------------------------------------------------
Comment (by glynn):

Replying to [comment:4 hamish]:

> I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'
gives no output. (all others do the same, report n-1)
>
> Is this intended because 3 bins will have two separators (at 33% and
66%), or is it a bug?

It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/474#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

On Wednesday 11 February 2009, GRASS GIS wrote:

#474: r.quantile: segfaults with percentile=100
---------------------+-----------------------------------------------------
- Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: closed
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+-----------------------------------------------------
- Comment (by glynn):

Replying to [comment:4 hamish]:
> I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'

gives no output. (all others do the same, report n-1)

> Is this intended because 3 bins will have two separators (at 33% and

66%), or is it a bug?

It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.

Is this a convention? I am not a math/stats expert, but in R I see that the
convetion is to report it like this:

# generate some random data
x <- rnorm(100)
# compute quartiles:
quantile(x)
        0% 25% 50% 75% 100%
-2.1691897 -0.3627331 0.1307290 0.6652009 2.4798260

# we can see that it includes the min/max:
summary(x)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.16900 -0.36270 0.13070 0.07639 0.66520 2.48000

Is this just a display/semantics thing?

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On 11/02/09 18:46, Dylan Beaudette wrote:

On Wednesday 11 February 2009, GRASS GIS wrote:

#474: r.quantile: segfaults with percentile=100
---------------------+-----------------------------------------------------
- Reporter: hamish | Owner: grass-dev@lists.osgeo.org
      Type: defect | Status: closed
  Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: fixed | Keywords: r.quantile
  Platform: Linux | Cpu: x86-32
---------------------+-----------------------------------------------------
- Comment (by glynn):

Replying to [comment:4 hamish]:
> I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'

gives no output. (all others do the same, report n-1)

> Is this intended because 3 bins will have two separators (at 33% and

66%), or is it a bug?

It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.

Is this a convention? I am not a math/stats expert, but in R I see that the convetion is to report it like this:

# generate some random data
x <- rnorm(100)
# compute quartiles:
quantile(x)
        0% 25% 50% 75% 100% -2.1691897 -0.3627331 0.1307290 0.6652009 2.4798260

# we can see that it includes the min/max:
summary(x)
    Min. 1st Qu. Median Mean 3rd Qu. Max. -2.16900 -0.36270 0.13070 0.07639 0.66520 2.48000

Is this just a display/semantics thing?

Personally, I would prefer that min and max are also shown (like the output with the -r flag). Otherwise you have to look for those values somewhere else, and I do find it useful to have them.

Moritz

Dylan Beaudette wrote:

> It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
> quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
> include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.

Is this a convention? I am not a math/stats expert, but in R I see that the
convetion is to report it like this:

# generate some random data
x <- rnorm(100)
# compute quartiles:
quantile(x)
        0% 25% 50% 75% 100%
-2.1691897 -0.3627331 0.1307290 0.6652009 2.4798260

# we can see that it includes the min/max:
summary(x)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.16900 -0.36270 0.13070 0.07639 0.66520 2.48000

Is this just a display/semantics thing?

I don't have a statistics background, but I'm more familiar with
seeing e.g. 1st, 2nd, and 3rd quartiles, without the 0th and 4th
quartiles.

I can add the 0th and Nth quantiles if desired (i.e. quant=N gives N+1
values).

--
Glynn Clements <glynn@gclements.plus.com>