[GRASS-dev] adding lib_arraystats

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

On Feb 13, 2008, at 12:50 PM, grass-dev-request@lists.osgeo.org wrote:

What would be particularly useful is if it's possible for GRASS to use
R functions on small amounts of data. E.g. r.series and r.resamp.stats
both compute aggregates over relatively small amounts of data.

This is essentially what I had in mind when I first posted the suggestion--
using R code on simple arrays of data.

This is essentially what classInterval() needs:

initiate the R backend and workspace;
put the vector of doubles into the workspace;
load the classInt package into the workspace;
run the classInterval function with the argument values;
collect the break values vector from the output object;
optionally continue to collect a vector of strings giving the RGB values;
optionally generate an empirical cumulative distribution function plot
of the data showing the class intervals and chosen colours as PNG;
terminate the R backend;

This could be done as a shell script using temporary files (could be binary),
from Python on supported platforms, on Windows from Python via (D)COM, or using
the R shared object. None of them will be as fast as an in-GRASS solution,
although R backend startup times and package loading are very much faster now
than previously.

If it was practical to have e.g. "r.series method=R expression=...",
that would be much more useful than having to start R and load
potentially hundreds of rasters into memory.

Right-- the "not-in-memory" features of GRASS are extremely valuble.

Since R operations are vectorised, it might be possible to pass a block of
values (say k rows, p columns, where p is the number of input rasters) and do
apply() on it to get k rows back, but the blocking would be on the GRASS side.
But this would be for specialist things, I expect.

r.resamp.stats only works on a single map, but as it's essentially a
down-sampling tool, it's not unreasonable to use it on very large
input maps.

I like this idea. Use of a trimmed mean or other such R specialty would be
very nice here.

Some of the specialites are rather sophisticated, but others are more
sophisticated than needed in GIS on a regular basis. It may be that the R
functionality is more use here in supporting the implementation of code within
GRASS by permitting checking that both end up with largely the same output for
the same input.

Roger

FWIW, we are going to have to require the Python numpy (or numeric) package along with standard Python to get all the GUI features we now have. A number of these functions are available in numpy. Hence, they would be scriptable in Python without R. Might be worth considering as an alternative for some things.

Michael