[GRASS-dev] wierd results with r.series meth=perc90,q3

Hi,

I'm getting some strange results with r.series method=perc90 and quart3.

8 input maps, partially overlapping, each map has nodata areas of NULL.
(CELL maps, 0->254 + NULL)

the perc90 results in a blank map (r.info: min=nan, max=nan) while the
quart3 map only seems to have data in slivers where some of the layers
overlap.
?

also, it would be nice if for stats like min,max, if only CELL maps are
given for input, a CELL map will be written as output. Probably the
sanest way to do this is to add an outtype=CELL option to the module?
(e.g. r.in.xyz) I guess we should use rounding in the case of a CELL
map?

thanks,
Hamish

Hamish wrote:

I'm getting some strange results with r.series method=perc90 and quart3.

8 input maps, partially overlapping, each map has nodata areas of NULL.
(CELL maps, 0->254 + NULL)

the perc90 results in a blank map (r.info: min=nan, max=nan) while the
quart3 map only seems to have data in slivers where some of the layers
overlap.
?

This can happen if the quantile falls in the final non-null cell, i.e.
if there are less than 10 non-null cells for perc90 or less than 4 for
quart3.

The calculation of the quantile is:

  k = n * percent / 100;
  i0 = (int) floor(k);
  i1 = (int) ceil(k);

  *result = (i0 == i1)
    ? values[i0]
    : values[i0] * (i1 - k) + values[i1] * (k - i0);

where n is the number of non-null cells. The result is a linear
interpolation between the two values either side of k. If k falls
between n - 1 and n, the value of the nth cell, which is null, will be
used in the interpolation, resulting in a null result.

Furthermore, if the quantile falls in the final cell (i.e. there are
no non-null cells, but there are too few maps), the interpolation will
access the value beyond the end of the array, which will be garbage.

This doesn't apply to the median method, which uses:

  *result = (values[(n-1)/2] + values[n/2]) / 2;

I don't know if there's a standard way of performing this calculation,
but (to me) the most logical approach is to compute k as:

  k = (n - 1) * percent / 100;

E.g. for 10 values, the values of k for the percentiles would be:

    % k
    0 0.00
    5 0.45
   10 0.90
   15 1.35
   20 1.80
   25 2.25
   30 2.70
   35 3.15
   40 3.60
   45 4.05
   50 4.50
   55 4.95
   60 5.40
   65 5.85
   70 6.30
   75 6.75
   80 7.20
   85 7.65
   90 8.10
   95 8.55
  100 9.00

This gives 0% = the lowest value, 100% = the highest (non-null) value,
50% = the mean of the two middle values (#4 and #5).

It might be necessary to take steps to ensure that you don't end up
with i1 == n due to rounding (i.e. k == n - 1 + epsilon => i1 == n).

I'll modify the code if someone with reasonable knowledge of
statistics can confirm that this is the correct solution.

also, it would be nice if for stats like min,max, if only CELL maps are
given for input, a CELL map will be written as output. Probably the
sanest way to do this is to add an outtype=CELL option to the module?
(e.g. r.in.xyz) I guess we should use rounding in the case of a CELL
map?

It would be straightforward to add a flag to each method which
inidicates that the output should use the lowest common format of the
inputs (i.e. CELL inputs produce CELL results, etc). As this would
only be used for methods which produce integer results from integer
inputs (count, mode, min, max, {min,max}_raster, sum), rounding
shouldn't be an issue.

--
Glynn Clements <glynn@gclements.plus.com>