[GRASS-dev] r.series counting maps over certain threshold?

Hi

I need to calculate threshold value for growing degree days
http://en.wikipedia.org/wiki/Growing-degree_day

(we used that in our paper:

Neteler, M., Roiz, D., Rocchini, D., Castellani, C. and Rizzoli, A.
(2011). Terra and Aqua satellites track tiger mosquito invasion:
modeling the potential distribution of Aedes albopictus in
north-eastern Italy. International Journal of Health Geographics,
10:49
http://www.ij-healthgeographics.com/content/10/1/49
)

The existing implementation has been reverted last year, but how to
do that now? I have to find out at which DOY in a pixel a certain value
has been reached. Input are all single daily GDD maps of the year. So
far I used r.series with threshold and count.

Another usage scenario: count all days over a certain temperature pixelwise...
?

Markus

Markus Neteler wrote:

The existing implementation has been reverted last year, but how to
do that now? I have to find out at which DOY in a pixel a certain value
has been reached. Input are all single daily GDD maps of the year. So
far I used r.series with threshold and count.

Another usage scenario: count all days over a certain temperature pixelwise...
?

At present, the only solution is to pre-process the maps with
r.mapcalc then run r.series on the processed maps.

Alternatively, you could re-implement the threshold= option, but
without breaking the quantile= option (which is the reason why the
original version was reverted).

Also, you might consider changing the c_thresh() method to be more
generally useful; I can't imagine that the current implementation
would be of use for anything other than the exact case for which it
was originally written; e.g. it won't help with your second example
because it's triggered by an in-range test (with a hard-coded range of
threshold +/- 10) rather than a greater-than test.

Another option would be to just use r.mapcalc (using a script to
generate the expression). AFAIK, r.mapcalc doesn't have any hard-coded
limits on the number of input maps or the complexity of the
expression; the memory consumption will be worse than r.series (each
intermediate result will get its own row buffer), but only by a
constant factor.

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, Feb 23, 2012 at 6:50 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

The existing implementation has been reverted last year, but how to
do that now? I have to find out at which DOY in a pixel a certain value
has been reached. Input are all single daily GDD maps of the year. So
far I used r.series with threshold and count.

Another usage scenario: count all days over a certain temperature pixelwise...
?

At present, the only solution is to pre-process the maps with
r.mapcalc then run r.series on the processed maps.

Since I am commonly working with > 10k maps this would generate
a lot of overhead (time, disk space, effort).

Alternatively, you could re-implement the threshold= option, but
without breaking the quantile= option (which is the reason why the
original version was reverted).

If I knew how...

Also, you might consider changing the c_thresh() method to be more
generally useful; I can't imagine that the current implementation
would be of use for anything other than the exact case for which it
was originally written; e.g. it won't help with your second example
because it's triggered by an in-range test (with a hard-coded range of
threshold +/- 10) rather than a greater-than test.

I would be happy to see an improved version there.

Another option would be to just use r.mapcalc (using a script to
generate the expression). AFAIK, r.mapcalc doesn't have any hard-coded
limits on the number of input maps or the complexity of the
expression; the memory consumption will be worse than r.series (each
intermediate result will get its own row buffer), but only by a
constant factor.

I have already to tune /etc/security/limits.conf in order to open more than
1024 maps in one step. So any overhead may become critical.

Markus

Markus Neteler wrote:

> Also, you might consider changing the c_thresh() method to be more
> generally useful; I can't imagine that the current implementation
> would be of use for anything other than the exact case for which it
> was originally written; e.g. it won't help with your second example
> because it's triggered by an in-range test (with a hard-coded range of
> threshold +/- 10) rather than a greater-than test.

I would be happy to see an improved version there.

For a start, you need to decide exactly what that aggregate is
supposed to compute. It was called "threshold", but it returns the
index of the first map whose value falls within a particular range,
which isn't what most people would understand by the term (I would
have expected it to be either a less-than or greater-than test). Also,
once you've defined the condition, the first (or last) index at which
the condition holds and the total (or average) number of maps for
which the condition holds are entirely distinct aggregates.

For a range, both the centre and width (or lower and upper bounds)
should be parameters, i.e. the closure argument should point to a pair
of values. Add a third parameter for the sense of the test (i.e.
whether you're testing for values inside the range or outside).

For a threshold, you only need one value, plus whether the test is
above or below.

These could either be distinct aggregates or modifiers which
pre-process the data, which would then use the average, count or
max_raster aggregates.

Ultimately, r.series is a specialisation of r.mapcalc which replaces
generalised expressions with a fixed set of aggregates. There isn't
anything which can be done with r.series which can't be done with
r.mapcalc, but there will always be things which can't be done with
r.series without turning it into r.mapcalc.

> Alternatively, you could re-implement the threshold= option, but
> without breaking the quantile= option (which is the reason why the
> original version was reverted).

If I knew how...

One option is to just rename quantile= to e.g. parameter=. Another is
to have separate options but which set the same variable. Both of
these have the limitation that the parameter is limited to a single
numeric value.

Another option is to extend the aggregate data ("struct menu") with
information about any parameters specific to that aggregate. This is
more work but allows different aggregates to take different types
and/or numbers of parameters.

Another option is to just "hack" r.series for private use (i.e. not
commit the changes); then it doesn't matter if you break
method=quantile in the process.

Another option is to write something similar to r.series in Python
using the libraster bindings (grass.lib.raster) so that you can use
arbitrary Python expressions both to pre-process the data and
calculate aggregates.

> Another option would be to just use r.mapcalc (using a script to
> generate the expression). AFAIK, r.mapcalc doesn't have any hard-coded
> limits on the number of input maps or the complexity of the
> expression; the memory consumption will be worse than r.series (each
> intermediate result will get its own row buffer), but only by a
> constant factor.

I have already to tune /etc/security/limits.conf in order to open more than
1024 maps in one step. So any overhead may become critical.

Assuming that input maps are CELL or FCELL, the extra overhead should
be 20 bytes per column per map. The overall structure of the
expression is:

  eval(thresh=float($thresh),range=float($range),
  if(abs(map1-thresh)<range,1,
  if(abs(map2-thresh)<range,2,
  ...
  if(abs(mapN-thresh)<range,N,
  null()
  )...)))

In addition to the row buffer for the map itself (which you need for
any approach), there's one for the subtraction, one for the abs(), one
for the comparison, one for the map number, and one for the if().

So e.g. 10,000 maps with 1000 columns would require an extra 200MB;
10,000 maps with 10,000 columns would require an extra 2GB.

--
Glynn Clements <glynn@gclements.plus.com>

On Fri, Feb 24, 2012 at 11:54 AM, Glynn Clements
<glynn@gclements.plus.com> wrote:
...

Thanks for the suggestions, will study them.

So e.g. 10,000 maps with 1000 columns would require an extra 200MB;
10,000 maps with 10,000 columns would require an extra 2GB.

We have 19,000 x 22,000 pixels here... 13,000 maps.

Markus