#474: r.quantile: segfaults with percentile=100
---------------------+------------------------------------------------------
Reporter: hamish | Owner: grass-dev@lists.osgeo.org
Type: defect | Status: new
Priority: major | Milestone: 6.4.0
Component: Raster | Version: svn-develbranch6
Resolution: | Keywords: r.quantile
Platform: Linux | Cpu: x86-32
---------------------+------------------------------------------------------
Comment (by glynn):
Replying to [ticket:474 hamish]:
> presumably values[b->base + i0] is just outside the array.
Actually, it's way outside the array, as the last bin never gets
initialised.
Hopefully fixed in r35846.
This also fixes a bug where the quantile corresponds to the last value in
the bin (e.g. if there is only one value in the bin). The interpolation
will use the first value from the bin for the next quantile, which could
be way off.
Ideally, it should retain the values from the following bin in case one is
needed for interpolation. In practice, this won't make much difference.
Replying to [comment:1 glynn]:
> This also fixes a bug where the quantile corresponds to the
> last value in the bin (e.g. if there is only one value in the
> bin). The interpolation will use the first value from the bin
> for the next quantile, which could be way off.
I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'
gives no output. (all others do the same, report n-1)
Is this intended because 3 bins will have two separators (at 33% and 66%),
or is it a bug?
> I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'
gives no output. (all others do the same, report n-1)
>
> Is this intended because 3 bins will have two separators (at 33% and
66%), or is it a bug?
It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.
Replying to [comment:4 hamish]:
> I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'
gives no output. (all others do the same, report n-1)
> Is this intended because 3 bins will have two separators (at 33% and
66%), or is it a bug?
It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.
Is this a convention? I am not a math/stats expert, but in R I see that the
convetion is to report it like this:
# generate some random data
x <- rnorm(100)
# compute quartiles:
quantile(x)
0% 25% 50% 75% 100%
-2.1691897 -0.3627331 0.1307290 0.6652009 2.4798260
# we can see that it includes the min/max:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.16900 -0.36270 0.13070 0.07639 0.66520 2.48000
Replying to [comment:4 hamish]:
> I notice that if you do 'quant=3' you only get 2 results, and 'quant=1'
gives no output. (all others do the same, report n-1)
> Is this intended because 3 bins will have two separators (at 33% and
66%), or is it a bug?
It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.
Is this a convention? I am not a math/stats expert, but in R I see that the convetion is to report it like this:
# generate some random data
x <- rnorm(100)
# compute quartiles:
quantile(x)
0% 25% 50% 75% 100% -2.1691897 -0.3627331 0.1307290 0.6652009 2.4798260
# we can see that it includes the min/max:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max. -2.16900 -0.36270 0.13070 0.07639 0.66520 2.48000
Is this just a display/semantics thing?
Personally, I would prefer that min and max are also shown (like the output with the -r flag). Otherwise you have to look for those values somewhere else, and I do find it useful to have them.
> It's intended so that quant=N gives "N-tiles", e.g. quant=4 gives
> quartiles, quant=10 gives deciles, etc. AIUI, the convention is not to
> include the endpoints, e.g. "quartiles" are given as 25%, 50%, and 75%.
Is this a convention? I am not a math/stats expert, but in R I see that the
convetion is to report it like this:
# generate some random data
x <- rnorm(100)
# compute quartiles:
quantile(x)
0% 25% 50% 75% 100%
-2.1691897 -0.3627331 0.1307290 0.6652009 2.4798260
# we can see that it includes the min/max:
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.16900 -0.36270 0.13070 0.07639 0.66520 2.48000
Is this just a display/semantics thing?
I don't have a statistics background, but I'm more familiar with
seeing e.g. 1st, 2nd, and 3rd quartiles, without the 0th and 4th
quartiles.
I can add the 0th and Nth quantiles if desired (i.e. quant=N gives N+1
values).