Hamish wrote:

I'm getting some strange results with r.series method=perc90 and quart3.

8 input maps, partially overlapping, each map has nodata areas of NULL.

(CELL maps, 0->254 + NULL)

the perc90 results in a blank map (r.info: min=nan, max=nan) while the

quart3 map only seems to have data in slivers where some of the layers

overlap.

?

This can happen if the quantile falls in the final non-null cell, i.e.

if there are less than 10 non-null cells for perc90 or less than 4 for

quart3.

The calculation of the quantile is:

k = n * percent / 100;

i0 = (int) floor(k);

i1 = (int) ceil(k);

*result = (i0 == i1)

? values[i0]

: values[i0] * (i1 - k) + values[i1] * (k - i0);

where n is the number of non-null cells. The result is a linear

interpolation between the two values either side of k. If k falls

between n - 1 and n, the value of the nth cell, which is null, will be

used in the interpolation, resulting in a null result.

Furthermore, if the quantile falls in the final cell (i.e. there are

no non-null cells, but there are too few maps), the interpolation will

access the value beyond the end of the array, which will be garbage.

This doesn't apply to the median method, which uses:

*result = (values[(n-1)/2] + values[n/2]) / 2;

I don't know if there's a standard way of performing this calculation,

but (to me) the most logical approach is to compute k as:

k = (n - 1) * percent / 100;

E.g. for 10 values, the values of k for the percentiles would be:

% k

0 0.00

5 0.45

10 0.90

15 1.35

20 1.80

25 2.25

30 2.70

35 3.15

40 3.60

45 4.05

50 4.50

55 4.95

60 5.40

65 5.85

70 6.30

75 6.75

80 7.20

85 7.65

90 8.10

95 8.55

100 9.00

This gives 0% = the lowest value, 100% = the highest (non-null) value,

50% = the mean of the two middle values (#4 and #5).

It might be necessary to take steps to ensure that you don't end up

with i1 == n due to rounding (i.e. k == n - 1 + epsilon => i1 == n).

I'll modify the code if someone with reasonable knowledge of

statistics can confirm that this is the correct solution.

also, it would be nice if for stats like min,max, if only CELL maps are

given for input, a CELL map will be written as output. Probably the

sanest way to do this is to add an outtype=CELL option to the module?

(e.g. r.in.xyz) I guess we should use rounding in the case of a CELL

map?

It would be straightforward to add a flag to each method which

inidicates that the output should use the lowest common format of the

inputs (i.e. CELL inputs produce CELL results, etc). As this would

only be used for methods which produce integer results from integer

inputs (count, mode, min, max, {min,max}_raster, sum), rounding

shouldn't be an issue.

--

Glynn Clements <glynn@gclements.plus.com>