[GRASS-dev] adding lib_arraystats

Hello,

I've finally gotten around to continue working on the d.thematic.* modules, and more specifically on the classification code. As mentioned earlier, I think it makes sense to make the latter into a library, so I decided to create lib_arraystats which contains functions for collecting basic statistics and for finding class breaks in arrays of doubles. In the future this could be filled with more statistical functions on such arrays.

Could the gurus please have a look and tell me if the attached files are decent enough (except for the lacking documentation) to be committed to svn for further development ? Once that's done, I can also commit the d.thematic.area and v.class modules.

Moritz

(attachments)

Grass.make.in.diff.gz (432 Bytes)
arraystats_lib.tgz (3.75 KB)

On Monday 11 February 2008, Moritz Lennert wrote:

Hello,

I've finally gotten around to continue working on the d.thematic.*
modules, and more specifically on the classification code. As mentioned
earlier, I think it makes sense to make the latter into a library, so I
decided to create lib_arraystats which contains functions for collecting
basic statistics and for finding class breaks in arrays of doubles. In
the future this could be filled with more statistical functions on such
arrays.

Could the gurus please have a look and tell me if the attached files are
decent enough (except for the lacking documentation) to be committed to
svn for further development ? Once that's done, I can also commit the
d.thematic.area and v.class modules.

Moritz

Nice work Moritz.

On this thread-- it would be neat if we can use (along side Moritz's and
existinf stat libs) the shared R library, when available. I think that it
would have to be a compile-time option, and the user would have to have R
installed, and compiled with the '--enable-R-shlib' flag. This would allow us
to switch between the default, basic set of algorithms, and the entire suite
of R codes. Some of these functions might not work with huge datasets (R
works with data in-memory)-- but it would allow us to program more
complicated algorithms without re-inventing them.

This thread reminded me of doing this, as sometimes it is useful to look
for "natural classes" within data. Instead of re-implementing a variation of
K-means, we could just pass an array to the appropriate (set of) R functions.

However, an hour spent tinkering around with the R shared lib examples left me
mostly frustrated....

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Dylan wrote:

However, an hour spent tinkering around with the R shared lib
examples left me mostly frustrated....

From the everything looks like a nail dept:

Does R offer SWIG bindings into the lib that we could use with python?

Hamish

      ____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

On Feb 11, 2008 7:30 PM, Hamish <hamish_b@yahoo.com> wrote:

Dylan wrote:
> However, an hour spent tinkering around with the R shared lib
> examples left me mostly frustrated....

From the everything looks like a nail dept:
Does R offer SWIG bindings into the lib that we could use with python?

Hamish

This project (http://rpy.sourceforge.net/) comes to mind. Not sure if
it is SWIG based... It looks like straight-up python:

import Rpy
...

I suppose that the only problem then is getting Rpy installed. On a
debian-like system you can install R and Rpy via aptitude. If you
build R from source, then I suppose you would have to build / install
Rpy in a similar manner.

Before giving up on the C-API to R, I think that there are a couple
more approaches, which might be simpler.

1. call R in batch mode. this would involve the use of temp files,
passing data and commands back and forth from GRASS to R-- with the
shell as mediator. This would only require a vanilla install of R.

2. build R as a shared lib, and use the "embedded" interface to R
commands. There are a couple examples in the source tree for this.

However-- this is starting to get into the realm of feature bloat...
As neat as it would be to have R routines which could work directly on
GRASS data structures, it might be overkill considering the possible
amount of work. I'll keep checking around and post back other ideas.

On the other hand-- R has so many great plotting / analysis routines,
it would be a shame to re-implement these in GRASS if they could be
easily "plugged-into" via some kind of share library / SWIG
interface.

Dylan

On Tuesday 12 February 2008, Dylan Beaudette wrote:

On Feb 11, 2008 7:30 PM, Hamish <hamish_b@yahoo.com> wrote:
> Dylan wrote:
> > However, an hour spent tinkering around with the R shared lib
> > examples left me mostly frustrated....
>
> From the everything looks like a nail dept:
> Does R offer SWIG bindings into the lib that we could use with python?
>
>
> Hamish

This project (http://rpy.sourceforge.net/) comes to mind. Not sure if
it is SWIG based... It looks like straight-up python:

import Rpy
...

I suppose that the only problem then is getting Rpy installed. On a
debian-like system you can install R and Rpy via aptitude. If you
build R from source, then I suppose you would have to build / install
Rpy in a similar manner.

Before giving up on the C-API to R, I think that there are a couple
more approaches, which might be simpler.

1. call R in batch mode. this would involve the use of temp files,
passing data and commands back and forth from GRASS to R-- with the
shell as mediator. This would only require a vanilla install of R.

2. build R as a shared lib, and use the "embedded" interface to R
commands. There are a couple examples in the source tree for this.

I almost forgot the 3rd option:

use the spGRASS6 library in R, and read-in GRASS vector/raster/dbase files
into an R session.

Dylan

However-- this is starting to get into the realm of feature bloat...
As neat as it would be to have R routines which could work directly on
GRASS data structures, it might be overkill considering the possible
amount of work. I'll keep checking around and post back other ideas.

On the other hand-- R has so many great plotting / analysis routines,
it would be a shame to re-implement these in GRASS if they could be
easily "plugged-into" via some kind of share library / SWIG
interface.

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Dylan Beaudette wrote:

However-- this is starting to get into the realm of feature bloat...
As neat as it would be to have R routines which could work directly on
GRASS data structures, it might be overkill considering the possible
amount of work. I'll keep checking around and post back other ideas.

On the other hand-- R has so many great plotting / analysis routines,
it would be a shame to re-implement these in GRASS if they could be
easily "plugged-into" via some kind of share library / SWIG
interface.

Another concern is that it could discourage development of new
functionality in GRASS itself. In the worst case, GRASS could end up
of little use unless you are familiar with R, and your map fits into
memory.

This is similar to my concerns regarding the use of high-level Python
libraries in the GUI suppressing development of more widely-applicable
display modules.

External tools should be used to augment GRASS functionality, not to
replace it.

That isn't to say that we shouldn't facilitate integration with tools
such as R. Certainly, I would prefer R integration to adding our own
half-baked statistics package. But we also need to avoid delegating
"core" functionality to external packages.

What would be particularly useful is if it's possible for GRASS to use
R functions on small amounts of data. E.g. r.series and r.resamp.stats
both compute aggregates over relatively small amounts of data.

If it was practical to have e.g. "r.series method=R expression=...",
that would be much more useful than having to start R and load
potentially hundreds of rasters into memory.

r.resamp.stats only works on a single map, but as it's essentially a
down-sampling tool, it's not unreasonable to use it on very large
input maps.

--
Glynn Clements <glynn@gclements.plus.com>

On Tuesday 12 February 2008, Glynn Clements wrote:

Dylan Beaudette wrote:
> However-- this is starting to get into the realm of feature bloat...
> As neat as it would be to have R routines which could work directly on
> GRASS data structures, it might be overkill considering the possible
> amount of work. I'll keep checking around and post back other ideas.
>
> On the other hand-- R has so many great plotting / analysis routines,
> it would be a shame to re-implement these in GRASS if they could be
> easily "plugged-into" via some kind of share library / SWIG
> interface.

Thanks for the follow-up Glynn.;

Another concern is that it could discourage development of new
functionality in GRASS itself. In the worst case, GRASS could end up
of little use unless you are familiar with R, and your map fits into
memory.

This is similar to my concerns regarding the use of high-level Python
libraries in the GUI suppressing development of more widely-applicable
display modules.

External tools should be used to augment GRASS functionality, not to
replace it.

I agree completely-- just as long as we don't go overboard re-inventing the
wheel.

That isn't to say that we shouldn't facilitate integration with tools
such as R. Certainly, I would prefer R integration to adding our own
half-baked statistics package. But we also need to avoid delegating
"core" functionality to external packages.

What would be particularly useful is if it's possible for GRASS to use
R functions on small amounts of data. E.g. r.series and r.resamp.stats
both compute aggregates over relatively small amounts of data.

This is essentially what I had in mind when I first posted the suggestion--
using R code on simple arrays of data.

If it was practical to have e.g. "r.series method=R expression=...",
that would be much more useful than having to start R and load
potentially hundreds of rasters into memory.

Right-- the "not-in-memory" features of GRASS are extremely valuble.

r.resamp.stats only works on a single map, but as it's essentially a
down-sampling tool, it's not unreasonable to use it on very large
input maps.

I like this idea. Use of a trimmed mean or other such R specialty would be
very nice here.

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Dylan Beaudette wrote:

> However-- this is starting to get into the realm of feature
> bloat...
> As neat as it would be to have R routines which could work directly
> on GRASS data structures, it might be overkill considering the
> possible amount of work. I'll keep checking around and post back
> other ideas.

Well, people always seems to ask about kriging because they've heard of
it.
http://lists.osgeo.org/pipermail/grass-user/2004-May/025709.html

(see s.surf.krig in GRASS 5 and v.surf.krige on the wiki addons page)

> On the other hand-- R has so many great plotting / analysis
> routines, it would be a shame to re-implement these in GRASS if
> they could be easily "plugged-into" via some kind of share library
> / SWIG interface.

Yup, it would be nice to be able to provide modules with R as the
backend versus writing/maintaining something new ourselves.

Hamish

      ____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

On 12/02/08 03:43, Dylan Beaudette wrote:

On Monday 11 February 2008, Moritz Lennert wrote:

Hello,

I've finally gotten around to continue working on the d.thematic.*
modules, and more specifically on the classification code. As mentioned
earlier, I think it makes sense to make the latter into a library, so I
decided to create lib_arraystats which contains functions for collecting
basic statistics and for finding class breaks in arrays of doubles. In
the future this could be filled with more statistical functions on such
arrays.

Could the gurus please have a look and tell me if the attached files are
decent enough (except for the lacking documentation) to be committed to
svn for further development ? Once that's done, I can also commit the
d.thematic.area and v.class modules.

Moritz

Nice work Moritz.

On this thread-- it would be neat if we can use (along side Moritz's and existinf stat libs) the shared R library, when available. I think that it would have to be a compile-time option, and the user would have to have R installed, and compiled with the '--enable-R-shlib' flag. This would allow us to switch between the default, basic set of algorithms, and the entire suite of R codes. Some of these functions might not work with huge datasets (R works with data in-memory)-- but it would allow us to program more complicated algorithms without re-inventing them.

I agree that having a more direct link to R would be nice. At the same, I'm not sure if we should do this across all modules using stats. Maybe a g.rstats or v.rstats could be an option ?

This thread reminded me of doing this, as sometimes it is useful to look for "natural classes" within data. Instead of re-implementing a variation of K-means, we could just pass an array to the appropriate (set of) R functions.

I've done this, but via the plr interface between R and PostgreSQL. See attached script for an example.

Moritz

(attachments)

indics.sh (1.92 KB)

On 11/02/08 23:29, Moritz Lennert wrote:

Hello,

I've finally gotten around to continue working on the d.thematic.* modules, and more specifically on the classification code. As mentioned earlier, I think it makes sense to make the latter into a library, so I decided to create lib_arraystats which contains functions for collecting basic statistics and for finding class breaks in arrays of doubles. In the future this could be filled with more statistical functions on such arrays.

Could the gurus please have a look and tell me if the attached files are decent enough (except for the lacking documentation) to be committed to svn for further development ? Once that's done, I can also commit the d.thematic.area and v.class modules.

I'm happy that I've been able to launch a discussion about linking R into GRASS, but could someone tell me if I can check the code into svn ?

:wink:

Moritz

Hi Moritz,

On Wed, 13 Feb 2008, Moritz Lennert wrote:

On 11/02/08 23:29, Moritz Lennert wrote:

Hello,

I've finally gotten around to continue working on the d.thematic.* modules, and more specifically on the classification code. As mentioned earlier, I think it makes sense to make the latter into a library, so I decided to create lib_arraystats which contains functions for collecting basic statistics and for finding class breaks in arrays of doubles. In the future this could be filled with more statistical functions on such arrays.

Could the gurus please have a look and tell me if the attached files are decent enough (except for the lacking documentation) to be committed to svn for further development ? Once that's done, I can also commit the d.thematic.area and v.class modules.

I'm happy that I've been able to launch a discussion about linking R into GRASS, but could someone tell me if I can check the code into svn ?

I can't presume to say anything other than if you feel happy the quality and functionality is up to inclusion in GRASS, go ahead - but if you'd like a couple of comments on it here goes:

the typedef STATS looks a little weird to me, (a) because STATS on its own is a very generic sounding name and (b) because in general typedefs are used very little in GRASS (for one thing they confuse me...) I'd suggest something along the lines of "struct GASTATS", avoiding the typedef.

EXIT_SUCCESS used as a function exit status looks a bit strange to me too- IIUC this is defined in stdlib.h and intended for programs to return a status code to the enviroment when they exit, not really for internal use by functions.

But I guess they're minor issues - just trying to find something useful to say..

Paul

On Feb 13, 2008 12:10 AM, Hamish <hamish_b@yahoo.com> wrote:

Dylan Beaudette wrote:
> > However-- this is starting to get into the realm of feature
> > bloat...
> > As neat as it would be to have R routines which could work directly
> > on GRASS data structures, it might be overkill considering the
> > possible amount of work. I'll keep checking around and post back
> > other ideas.

Well, people always seems to ask about kriging because they've heard of
it.
http://lists.osgeo.org/pipermail/grass-user/2004-May/025709.html

(see s.surf.krig in GRASS 5 and v.surf.krige on the wiki addons page)

This does seem to come up a lot. However...

<rant>
This is one piece of functionality I would rather see in a specialized
system-- i.e. implemented through gstat etc. Kriging is one of those
semi-mystical things that gets lumped into many GIS courses, and
through the magic of automated wizzards and such-- out pops a map.
Unlike RST / IDW, kriging requires several important steps and
decisions (variogram estimation, assessment of stationarity, etc..)
which the act of using something like gstat (partially) ensures.
</rant>

That said-- this is one of the cases where a strong tie to R is a
great thing. The use of an embedded R (per my previous comments) might
facilitate these more complex operations (kriging) in a way that is
not abstracted away from the wealth of good documentation (thinking
gstat here).

Without much trouble it would be possible to document a standard
kriging exercise with the spearfish data (or the new NC data)-- such
that the concern expressed in the linked thread
(http://lists.osgeo.org/pipermail/grass-user/2004-May/025709.html) can
be addressed. Here is one such (not necessarily the best) example:
http://casoilresource.lawr.ucdavis.edu/drupal/node/438

> > On the other hand-- R has so many great plotting / analysis
> > routines, it would be a shame to re-implement these in GRASS if
> > they could be easily "plugged-into" via some kind of share library
> > / SWIG interface.

Yup, it would be nice to be able to provide modules with R as the
backend versus writing/maintaining something new ourselves.

I suppose that these things really belong in a discussion on GRASS7.
I'll try and keep on topic!

Dylan

      ____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Dylan Beaudette <dylan.beaudette <at> gmail.com> writes:

On Tuesday 12 February 2008, Glynn Clements wrote:
> Dylan Beaudette wrote:
> > On the other hand-- R has so many great plotting / analysis routines,
> > it would be a shame to re-implement these in GRASS if they could be
> > easily "plugged-into" via some kind of share library / SWIG
> > interface.

The Rpy route is feasible but in my experience not very portable; essentially
you push R scripts across the Python-R interface together with as little data as
possible, and get as little data as possible back.

Thanks for the follow-up Glynn.;

> Another concern is that it could discourage development of new
> functionality in GRASS itself. In the worst case, GRASS could end up
> of little use unless you are familiar with R, and your map fits into
> memory.
>
> This is similar to my concerns regarding the use of high-level Python
> libraries in the GUI suppressing development of more widely-applicable
> display modules.
>
> External tools should be used to augment GRASS functionality, not to
> replace it.

I agree completely-- just as long as we don't go overboard re-inventing the
wheel.

Yes, very sensible. With reference to the original question, I think in fact
that class interval finding could be done fairly readily using the classInterval
function in the classInt package, which provides lots of methods, and does not
require any spatial data, just data as such, and for rasters, decimated or
resampled data would be good enough. But using R would be just for people who
need it, and for people who are developing functions within GRASS itself who
need to test out methods implementations. The classInt package help pages are at:

http://cran.r-project.org/doc/packages/classInt.pdf

> That isn't to say that we shouldn't facilitate integration with tools
> such as R. Certainly, I would prefer R integration to adding our own
> half-baked statistics package. But we also need to avoid delegating
> "core" functionality to external packages.
>
> What would be particularly useful is if it's possible for GRASS to use
> R functions on small amounts of data. E.g. r.series and r.resamp.stats
> both compute aggregates over relatively small amounts of data.

This is essentially what I had in mind when I first posted the suggestion--
using R code on simple arrays of data.

This is essentially what classInterval() needs:

initiate the R backend and workspace;
put the vector of doubles into the workspace;
load the classInt package into the workspace;
run the classInterval function with the argument values;
collect the break values vector from the output object;
optionally continue to collect a vector of strings giving the RGB values;
optionally generate an empirical cumulative distribution function plot
  of the data showing the class intervals and chosen colours as PNG;
terminate the R backend;

This could be done as a shell script using temporary files (could be binary),
from Python on supported platforms, on Windows from Python via (D)COM, or using
the R shared object. None of them will be as fast as an in-GRASS solution,
although R backend startup times and package loading are very much faster now
than previously.

> If it was practical to have e.g. "r.series method=R expression=...",
> that would be much more useful than having to start R and load
> potentially hundreds of rasters into memory.

Right-- the "not-in-memory" features of GRASS are extremely valuble.

Since R operations are vectorised, it might be possible to pass a block of
values (say k rows, p columns, where p is the number of input rasters) and do
apply() on it to get k rows back, but the blocking would be on the GRASS side.
But this would be for specialist things, I expect.

> r.resamp.stats only works on a single map, but as it's essentially a
> down-sampling tool, it's not unreasonable to use it on very large
> input maps.

I like this idea. Use of a trimmed mean or other such R specialty would be
very nice here.

Some of the specialites are rather sophisticated, but others are more
sophisticated than needed in GIS on a regular basis. It may be that the R
functionality is more use here in supporting the implementation of code within
GRASS by permitting checking that both end up with largely the same output for
the same input.

Roger

Dylan

Paul Kelly wrote:

> I'm happy that I've been able to launch a discussion about linking R into
> GRASS, but could someone tell me if I can check the code into svn ?

I can't presume to say anything other than if you feel happy the quality
and functionality is up to inclusion in GRASS, go ahead - but if you'd
like a couple of comments on it here goes:

the typedef STATS looks a little weird to me, (a) because STATS on its own
is a very generic sounding name and (b) because in general typedefs are
used very little in GRASS (for one thing they confuse me...) I'd suggest
something along the lines of "struct GASTATS", avoiding the typedef.

Agreed. GRASS normally uses structure tags rather than typedefs.

EXIT_SUCCESS used as a function exit status looks a bit strange to me too-
IIUC this is defined in stdlib.h and intended for programs to return a
status code to the enviroment when they exit, not really for internal use
by functions.

Good point. EXIT_SUCCESS and EXIT_FAILURE are (technically)
platform-specific. So any code which called the function would need to
check against those specific constants (you can't assume that success
is zero and failure is non-zero).

FWIW, I suggest C truth values for functions which don't return
anything other than success or failure, with true (non-zero) for
success and false (zero) for failure. Tests would be written as:

  if (foo()) { /* if foo succeeded */ }
or:
  if (!foo()) { /* if foo failed */ }

Reversing the sense of the result (i.e. zero for success) means that
you would have a different convention for int results than for
pointers. Functions which return pointers almost[1] invariably return
NULL (i.e. zero) for failure.

[1] The notable exception is mmap(), which returns ((void *) -1) for
failure. I'm sure they had a reason, but I have absolutely no clue as
to what that reason might be.

--
Glynn Clements <glynn@gclements.plus.com>

Moritz Lennert wrote:

I've finally gotten around to continue working on the d.thematic.*
modules, and more specifically on the classification code. As mentioned
earlier, I think it makes sense to make the latter into a library, so I
decided to create lib_arraystats which contains functions for collecting
basic statistics and for finding class breaks in arrays of doubles. In
the future this could be filled with more statistical functions on such
arrays.

Could the gurus please have a look and tell me if the attached files are
decent enough (except for the lacking documentation) to be committed to
svn for further development ? Once that's done, I can also commit the
d.thematic.area and v.class modules.

class_stdev() should probably just return the scale rather than a
formatted message. The caller can then construct the message if it so
desires.

Some stylistic comments:

1. "x += k", "x -= k" etc are clearer than "x = x + k" etc. The former
make it immediately clear that the variable (or array element, etc) is
being modified according to a particular idiom (i.e. having some
quantity added to or subtracted from it), rather than being assigned
the result of some arbitrary expression.

2. class_discont() looks like it might be more clear using an array of
structures for num/no/zz/co, rather than parallel arrays. E.g.

  struct class {
    int num;
    double no;
    double zz;
    double co;
  } *classes;

Then, code which accesses multiple fields doesn't need to keep
specifying the index. E.g.

  for (j = 1; j <= i; j++) {
      no[j] = num[j];
      zz[j] = x[num[j]] * rangemax + min;
      if (j == i)
    continue;
      if (co[j] > co[j + 1]) {
    zz[j] = zz[j] + rangemin;
    continue;
      }
      zz[j] = zz[j] - rangemin;
      no[j] = no[j] - 1;
  }

becomes:

  for (j = 1; j <= i; j++) {
      struct class *c = &classes[j];
      struct class *next = &classes[j + 1];
      c->no = c->num;
      c->zz = x[c->num] * rangemax + min;
      if (j == i)
    continue;
      if (c->co > next->co) {
    c->zz += rangemin;
    continue;
      }
      c->zz -= rangemin;
      c->no -= 1;
  }

The relieves the reader of the need to visually check all of those
[j]s to ensure that there isn't an [i] hidden amongst them.

--
Glynn Clements <glynn@gclements.plus.com>

On 13/02/08 22:01, Glynn Clements wrote:

Paul Kelly wrote:

I'm happy that I've been able to launch a discussion about linking R into GRASS, but could someone tell me if I can check the code into svn ?

I can't presume to say anything other than if you feel happy the quality and functionality is up to inclusion in GRASS, go ahead - but if you'd like a couple of comments on it here goes:

the typedef STATS looks a little weird to me, (a) because STATS on its own is a very generic sounding name and (b) because in general typedefs are used very little in GRASS (for one thing they confuse me...) I'd suggest something along the lines of "struct GASTATS", avoiding the typedef.

Agreed. GRASS normally uses structure tags rather than typedefs.

EXIT_SUCCESS used as a function exit status looks a bit strange to me too- IIUC this is defined in stdlib.h and intended for programs to return a status code to the enviroment when they exit, not really for internal use by functions.

Thanks to both of you for all your remarks, I'll try to rewrite the code accordingly.

Concerning EXIT_SUCCESS, I based this on the SUBMITTING file which says:

8. Exit status is defined as EXIT_SUCCESS or EXIT_FAILURE, e.g.

     {
       ...
       if (G_parser (argc, argv))
           exit (EXIT_FAILURE);

       ...
       exit (EXIT_SUCCESS);
     }

I guess this should read "Module exit status..." (i.e. not function exit status) ?

Moritz

Roger Bivand wrote:

> > What would be particularly useful is if it's possible for GRASS to use
> > R functions on small amounts of data. E.g. r.series and r.resamp.stats
> > both compute aggregates over relatively small amounts of data.
>
> This is essentially what I had in mind when I first posted the suggestion--
> using R code on simple arrays of data.

This is essentially what classInterval() needs:

initiate the R backend and workspace;
put the vector of doubles into the workspace;
load the classInt package into the workspace;
run the classInterval function with the argument values;
collect the break values vector from the output object;
optionally continue to collect a vector of strings giving the RGB values;
optionally generate an empirical cumulative distribution function plot
  of the data showing the class intervals and chosen colours as PNG;
terminate the R backend;

What r.series and r.resamp.stats need is:

  Initialise R
  For each cell:
    push a vector of doubles into R
    call the R function on the vector
    pull the result (a single double) from R
  Shut down R

> > If it was practical to have e.g. "r.series method=R expression=...",
> > that would be much more useful than having to start R and load
> > potentially hundreds of rasters into memory.
>
> Right-- the "not-in-memory" features of GRASS are extremely valuble.

Since R operations are vectorised, it might be possible to pass a block of
values (say k rows, p columns, where p is the number of input rasters) and do
apply() on it to get k rows back, but the blocking would be on the GRASS side.
But this would be for specialist things, I expect.

This how any GRASS modules would want to use R. If you're transferring
entire maps, you would be better off just coding the whole thing in R.

GRASS modules inherently operate on chunks of data, either rows, or a
sliding window of several rows, or a region of cells, etc.

Processes which need to operate upon an entire map don't really need
to "integrate" with R; they can just run an R program as a
self-contained operation.

--
Glynn Clements <glynn@gclements.plus.com>

Moritz Lennert wrote:

>> EXIT_SUCCESS used as a function exit status looks a bit strange to me too-
>> IIUC this is defined in stdlib.h and intended for programs to return a
>> status code to the enviroment when they exit, not really for internal use
>> by functions.

Thanks to both of you for all your remarks, I'll try to rewrite the code
accordingly.

Concerning EXIT_SUCCESS, I based this on the SUBMITTING file which says:

8. Exit status is defined as EXIT_SUCCESS or EXIT_FAILURE, e.g.

     {
       ...
       if (G_parser (argc, argv))
           exit (EXIT_FAILURE);

       ...
       exit (EXIT_SUCCESS);
     }

I guess this should read "Module exit status..." (i.e. not function exit
status) ?

I wouldn't have thought so, but as it has bitten at least one person,
maybe it should.

"exit status" means what programs return via exit() or by returning
from main().

Functions don't "exit" with an "exit status", they "return" with a
"return value".

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, 14 Feb 2008, Glynn Clements wrote:

Roger Bivand wrote:

What would be particularly useful is if it's possible for GRASS to use
R functions on small amounts of data. E.g. r.series and r.resamp.stats
both compute aggregates over relatively small amounts of data.

This is essentially what I had in mind when I first posted the suggestion--
using R code on simple arrays of data.

This is essentially what classInterval() needs:

initiate the R backend and workspace;
put the vector of doubles into the workspace;
load the classInt package into the workspace;
run the classInterval function with the argument values;
collect the break values vector from the output object;
optionally continue to collect a vector of strings giving the RGB values;
optionally generate an empirical cumulative distribution function plot
  of the data showing the class intervals and chosen colours as PNG;
terminate the R backend;

What r.series and r.resamp.stats need is:

  Initialise R
  For each cell:
    push a vector of doubles into R
    call the R function on the vector
    pull the result (a single double) from R
  Shut down R

Certainly feasible. Probably it would be reasonable to block up the single cell, to push a cube of doubles and pull a matrix of doubles (vector with three dimensions/two dimensions), but that is just a performance question of whether the push/pull operations are costly compared to the vectorised function within R operating on multiple cells. Are we in Python for this?

Roger

If it was practical to have e.g. "r.series method=R expression=...",
that would be much more useful than having to start R and load
potentially hundreds of rasters into memory.

Right-- the "not-in-memory" features of GRASS are extremely valuble.

Since R operations are vectorised, it might be possible to pass a block of
values (say k rows, p columns, where p is the number of input rasters) and do
apply() on it to get k rows back, but the blocking would be on the GRASS side.
But this would be for specialist things, I expect.

This how any GRASS modules would want to use R. If you're transferring
entire maps, you would be better off just coding the whole thing in R.

GRASS modules inherently operate on chunks of data, either rows, or a
sliding window of several rows, or a region of cells, etc.

Processes which need to operate upon an entire map don't really need
to "integrate" with R; they can just run an R program as a
self-contained operation.

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand@nhh.no

Roger Bivand wrote:

> What r.series and r.resamp.stats need is:
>
> Initialise R
> For each cell:
> push a vector of doubles into R
> call the R function on the vector
> pull the result (a single double) from R
> Shut down R

Certainly feasible. Probably it would be reasonable to block up the single
cell, to push a cube of doubles and pull a matrix of doubles (vector with
three dimensions/two dimensions), but that is just a performance question
of whether the push/pull operations are costly compared to the vectorised
function within R operating on multiple cells. Are we in Python for this?

I'm referring to the case of calling R from C; no Python involved.

For r.series and r.resamp.stats[1], blocking the data would only
provide a significant benefit if there was a substantial per-call
overhead.

[1] For "r.resamp.stats -w", there would be some benefit to
maintaining a sliding window in R, to avoid having to transfer
boundary cells more than once.

--
Glynn Clements <glynn@gclements.plus.com>