[GRASS-dev] Using r.quantile result with r.recode

Hi,

I have a doubt about using the r.quantile output with r.recode.

r.quantile returns the upper limit value corresponding to each class, right?

But r.recode uses r.quantile result as closed interval in the lower value and open interval in the upper value. In other words, assuming that r.quantile returns the following result:

2.000000:6.000000:1
6.000000:8.000000:2
8.000000:12.000000:3
12.000000:20.000000:4
20.000000:872.727295:5

r.recode will use these data as follows:

[2.000000:6.000000[ -> 1
[6.000000:8.000000[ -> 2
[8.000000:12.000000[ -> 3
[12.000000:20.000000[ -> 4
[20.000000:872.727295[ -> 5

But this is not correct. According to quantiles method, the reclassification should be as follows:

]2.000000:6.000000] -> 1
]6.000000:8.000000] -> 2
]8.000000:12.000000] -> 3
]12.000000:20.000000] -> 4
]20.000000:872.727295] -> 5

or am I wrong?

Thanks!

Best regards,
Pedro Venâncio

Pedro Venâncio wrote:

I have a doubt about using the r.quantile output with r.recode.

r.quantile returns the upper limit value corresponding to each
class, right?

r.quantile calculates quantile values. If you use the -r flag, each
value is used as the upper limit for one range and the lower limit of
the next. The map's total range sets the lower limit of the first
range and the upper limit of the last range.

If the number of values is an exact multiple of the number of
divisions (e.g. 4 for quantiles, 10 for deciles, etc), the quantile
will be one of the input values, otherwise it will be linearly
interpolated between two adjacent values.

It's possible that there should be a -1 in the calculation; currently,
if you calculate the deciles of the 100 distinct values 0-99
inclusive, the results will be 10,20,...,90, so the ranges will be
0-10,10-20,...,90-99.

But r.recode uses r.quantile result as closed interval in the lower
value and open interval in the upper value.

r.recode does whatever the Rast_fpreclass_* functions do. Which is to
scan the rules from highest to lowest and stop on the first match. So
for a value which is on the boundary between ranges, the upper range
will be chosen.

But this is not correct. According to quantiles method, the reclassification should be as follows:

]2.000000:6.000000] -> 1
]6.000000:8.000000] -> 2
]8.000000:12.000000] -> 3
]12.000000:20.000000] -> 4
]20.000000:872.727295] -> 5

or am I wrong?

I think that if the number of values is an exact multiple of the
number of divisions, r.quantile+r.recode should result in each range
having exactly the same number of values.

But I'm not sure whether it's r.quantile, r.recode[1] or both which
need to be fixed.

[1] More precisely, the Rast_fpreclass_* functions.

--
Glynn Clements <glynn@gclements.plus.com>

[CC to grass-dev for discussion]

Pedro Venâncio wrote:

Thank you very much for your answer!

My question lies precisely in the need to know if a quantile value
which falls as the upper limit for one range and the lower limit of
the next, should belong to the class anterior or posterior.

For example, assuming that r.quantile (with -r flag) gives this result:

2:6:1
6:8:2
8:12:3
12:20:4
20:873:5

the value 6 should belong to the first class or second?

r.recode will treat boundary values as belonging to the upper range,
e.g. in the above example, 6.0 will get recoded to 2.

This behaviour stems from Rast_fpreclass_get_cell_value() in
lib/raster/fpreclass.c, and isn't configurable (i.e. there's no way
that r.recode's behaviour could be modified without modifying the
fpreclass functions).

--
Glynn Clements <glynn@gclements.plus.com>

Hi Glynn,

----- Original Message -----
From: Glynn Clements

r.recode will treat boundary values as belonging to the upper range,
e.g. in the above example, 6.0 will get recoded to 2.

So we can not use directly the result of r.quantile with -r flag in r.recode, right?

In the example, 6 should be recoded to 1, right?

Thank you very much!

Best regards,
Pedro

Pedro Venâncio wrote:

Thank you very much for your answer!

My question lies precisely in the need to know if a quantile value
which falls as the upper limit for one range and the lower limit of
the next, should belong to the class anterior or posterior.

For example, assuming that r.quantile (with -r flag) gives this result:

2:6:1
6:8:2
8:12:3
12:20:4
20:873:5

the value 6 should belong to the first class or second?

r.recode will treat boundary values as belonging to the upper range,
e.g. in the above example, 6.0 will get recoded to 2.

This behaviour stems from Rast_fpreclass_get_cell_value() in
lib/raster/fpreclass.c, and isn't configurable (i.e. there's no way
that r.recode's behaviour could be modified without modifying the
fpreclass functions).

--
Glynn Clements <glynn@gclements.plus.com>

Pedro Venâncio wrote:

> r.recode will treat boundary values as belonging to the upper range,
> e.g. in the above example, 6.0 will get recoded to 2.

So we can not use directly the result of r.quantile with -r flag in r.recode, right?

In the example, 6 should be recoded to 1, right?

For real data, most of the boundary values won't be actual values from
the input data, but values derived from interpolating two adjacent
values. In that situation, it doesn't matter how r.recode handles
boundary values.

--
Glynn Clements <glynn@gclements.plus.com>

Hi Glynn,

Unless I missed it, this does not seem to be mentioned explicitly in the r.recode help file. Would it be an idea to add this?

Paulo

···

On Tue, Apr 16, 2013 at 9:50 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

[CC to grass-dev for discussion]

Pedro Venâncio wrote:

Thank you very much for your answer!

My question lies precisely in the need to know if a quantile value
which falls as the upper limit for one range and the lower limit of
the next, should belong to the class anterior or posterior.

For example, assuming that r.quantile (with -r flag) gives this result:

2:6:1
6:8:2
8:12:3
12:20:4
20:873:5

the value 6 should belong to the first class or second?

r.recode will treat boundary values as belonging to the upper range,
e.g. in the above example, 6.0 will get recoded to 2.

This behaviour stems from Rast_fpreclass_get_cell_value() in
lib/raster/fpreclass.c, and isn’t configurable (i.e. there’s no way
that r.recode’s behaviour could be modified without modifying the
fpreclass functions).


Glynn Clements <glynn@gclements.plus.com>


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Paulo van Breugel wrote:

> r.recode will treat boundary values as belonging to the upper range,
> e.g. in the above example, 6.0 will get recoded to 2.
>
> This behaviour stems from Rast_fpreclass_get_cell_value() in
> lib/raster/fpreclass.c, and isn't configurable (i.e. there's no way
> that r.recode's behaviour could be modified without modifying the
> fpreclass functions).

Unless I missed it, this does not seem to be mentioned explicitly in the
r.recode help file. Would it be an idea to add this?

Actually, it's more accurate to say that r.recode uses the last range
specified (which isn't necessarily the upper range).

More generally, r.recode doesn't care if ranges overlap. It just
stores the rules in the order they are given, and performs a lookup by
iterating over the rules from last to first until it finds a match.
The fpreclass code has a function to reverse the order of rules, but
nothing uses it.

Apart from the lack of clarity regarding boundary values, the existing
method is inefficient if the number of rules is large; a binary search
tree (or similar) would be better.

--
Glynn Clements <glynn@gclements.plus.com>