Markus Neteler wrote:
> Also, you might consider changing the c_thresh() method to be more
> generally useful; I can't imagine that the current implementation
> would be of use for anything other than the exact case for which it
> was originally written; e.g. it won't help with your second example
> because it's triggered by an in-range test (with a hard-coded range of
> threshold +/- 10) rather than a greater-than test.
I would be happy to see an improved version there.
For a start, you need to decide exactly what that aggregate is
supposed to compute. It was called "threshold", but it returns the
index of the first map whose value falls within a particular range,
which isn't what most people would understand by the term (I would
have expected it to be either a less-than or greater-than test). Also,
once you've defined the condition, the first (or last) index at which
the condition holds and the total (or average) number of maps for
which the condition holds are entirely distinct aggregates.
For a range, both the centre and width (or lower and upper bounds)
should be parameters, i.e. the closure argument should point to a pair
of values. Add a third parameter for the sense of the test (i.e.
whether you're testing for values inside the range or outside).
For a threshold, you only need one value, plus whether the test is
above or below.
These could either be distinct aggregates or modifiers which
pre-process the data, which would then use the average, count or
max_raster aggregates.
Ultimately, r.series is a specialisation of r.mapcalc which replaces
generalised expressions with a fixed set of aggregates. There isn't
anything which can be done with r.series which can't be done with
r.mapcalc, but there will always be things which can't be done with
r.series without turning it into r.mapcalc.
> Alternatively, you could re-implement the threshold= option, but
> without breaking the quantile= option (which is the reason why the
> original version was reverted).
If I knew how...
One option is to just rename quantile= to e.g. parameter=. Another is
to have separate options but which set the same variable. Both of
these have the limitation that the parameter is limited to a single
numeric value.
Another option is to extend the aggregate data ("struct menu") with
information about any parameters specific to that aggregate. This is
more work but allows different aggregates to take different types
and/or numbers of parameters.
Another option is to just "hack" r.series for private use (i.e. not
commit the changes); then it doesn't matter if you break
method=quantile in the process.
Another option is to write something similar to r.series in Python
using the libraster bindings (grass.lib.raster) so that you can use
arbitrary Python expressions both to pre-process the data and
calculate aggregates.
> Another option would be to just use r.mapcalc (using a script to
> generate the expression). AFAIK, r.mapcalc doesn't have any hard-coded
> limits on the number of input maps or the complexity of the
> expression; the memory consumption will be worse than r.series (each
> intermediate result will get its own row buffer), but only by a
> constant factor.
I have already to tune /etc/security/limits.conf in order to open more than
1024 maps in one step. So any overhead may become critical.
Assuming that input maps are CELL or FCELL, the extra overhead should
be 20 bytes per column per map. The overall structure of the
expression is:
eval(thresh=float($thresh),range=float($range),
if(abs(map1-thresh)<range,1,
if(abs(map2-thresh)<range,2,
...
if(abs(mapN-thresh)<range,N,
null()
)...)))
In addition to the row buffer for the map itself (which you need for
any approach), there's one for the subtraction, one for the abs(), one
for the comparison, one for the map number, and one for the if().
So e.g. 10,000 maps with 1000 columns would require an extra 200MB;
10,000 maps with 10,000 columns would require an extra 2GB.
--
Glynn Clements <glynn@gclements.plus.com>