[GRASS-dev] [GRASS-user] Re: grass-user Digest, Vol 30, Issue 22

Below is a recent exchange between Markus (Neteler) and me about the new r.watershed.fast.

The gist is the question: is it ready to go into the develbranch_6 and trunk (7) svn or does it need more testing. I thought it was already in the main svn, but Markus pointed out that it is still in Addons.

Michael

Begin forwarded message:

From: Markus Neteler <neteler@osgeo.org>
Date: October 9, 2008 1:38:37 PM GMT-07:00
To: “Michael Barton” <michael.barton@asu.edu>
Subject: Re: [GRASS-user] Re: grass-user Digest, Vol 30, Issue 22

Please post it to the list, too. :slight_smile:

On Thu, Oct 9, 2008 at 10:22 PM, Michael Barton <michael.barton@asu.edu> wrote:

We’re almost at a point of recompiling GRASS on our main modeling box. When

we do that, we will certainly test the new r.watershed.fast. However, if

others test and want to include this in the svn, it’s fine by me to go ahead

and replace the old module.

Michael


C. Michael Barton, Professor of Anthropology

Director of Graduate Studies

School of Human Evolution & Social Change

Center for Social Dynamics & Complexity

Arizona State University

Tempe, AZ 85287-2402

USA

voice: 480-965-6262; fax: 480-965-7671

www: http://www.public.asu.edu/~cmbarton

On Oct 9, 2008, at 1:04 PM, Markus Neteler wrote:

On Thu, Oct 9, 2008 at 9:39 PM, Michael Barton <michael.barton@asu.edu>

wrote:

Ah.

But I did think that it was tested last summer, that it worked well, and

the

commentators on the list thought that it should be moved into the main

svn.

Do you think we need more testing? If so, I’ll try to have some done

here.

I am fine with all - if results are identical to old r.watershed and

the paramters/flags

are compliant, we can replace it even in 6.4. If not, put into 7.

I am out of time to do tests unfortunately.

Markus


Open Source Geospatial Foundation
http://www.osgeo.org/
http://www.grassbook.org/

On Thu, Oct 9, 2008 at 11:13 PM, Michael Barton <michael.barton@asu.edu> wrote:

Below is a recent exchange between Markus (Neteler) and me about the new
r.watershed.fast.
The gist is the question: is it ready to go into the develbranch_6 and trunk
(7) svn or does it need more testing. I thought it was already in the main
svn, but Markus pointed out that it is still in Addons.

Indeed it is not even in Addons - I tried to point out that Addons would be
a good place to facilitate testing.

@Markus: if you are interested, please check here:
http://trac.osgeo.org/grass/wiki/HowToContribute#WriteaccesstotheGRASS-Addons-SVNrepository
-> Write access to the GRASS-Addons-SVN repository

Markus

Markus Neteler wrote:

On Thu, Oct 9, 2008 at 11:13 PM, Michael Barton <michael.barton@asu.edu> wrote:
  

Below is a recent exchange between Markus (Neteler) and me about the new
r.watershed.fast.
The gist is the question: is it ready to go into the develbranch_6 and trunk
(7) svn or does it need more testing. I thought it was already in the main
svn, but Markus pointed out that it is still in Addons.
    
Indeed it is not even in Addons - I tried to point out that Addons would be
a good place to facilitate testing.

@Markus: if you are interested, please check here:
http://trac.osgeo.org/grass/wiki/HowToContribute#WriteaccesstotheGRASS-Addons-SVNrepository
-> Write access to the GRASS-Addons-SVN repository
  

Yes, I am interested! And this new version might need some testing, in
particular for the seg mode. The original version uses very little
memory, so assuming that GRASS runs today on systems where at least
500MB RAM are available I changed the parameters for the seg mode, more
data are kept in memory, speeding up the seg mode. Looking at other
modules using the segment library (e.g. v.surf.contour, r.cost), it
seems that there is not one universally used setting, instead the
segment parameters are tuned to each module. The new settings work for
me, but not necessarily for others, and maybe using 500MB is a bit much.
Still, the seg mode is slow and testing would require a lot of patience.
I only tested it for smaller regions, not yet for regions that would
require several GB of RAM. The aim is to get close to 2,147,483,647
cells in a region... BTW, has anybody recently used the seg mode of the
original version successfully and done so because the non-segmented
r.watershed would run out of memory?

I'm rather confident about the ram version, but it can do only good if
developers review the new code.

Will try to get write access to the GRASS-Addons-SVN repository, and add
an entry in the GRASS-Addons wiki.

Thanks for your feedback!

Markus

Markus Metz wrote:

assuming that GRASS runs today on systems where at least
500MB RAM are available

500MB total, 500MB per user, or 500MB per process?

It's safe to assume 500MB for the system (although much of GRASS can
run on a PDA, it's reasonable to assume that people won't be
performing complex analysis on such systems), but that doesn't mean
that a single process can use all of it.

Still, the seg mode is slow and testing would require a lot of patience.

GRASS' segement library (which r.watershed.seg uses) is quite
inefficient.

For the segmented r.proj (r.proj.seg in 6.3/6.4, r.proj in 7.0), I
wrote my own tile cache. If it can fit the entire map within the
specified amount of RAM, then it will do so (reading the map directly
into RAM without creating the segment file), without any noticeable
performance impact caused by the extra level of indirection.

If you can't fit the working set into RAM, it's going to be slow
whichever approach you take. Reading into "memory" which is actually
swap isn't going to be any quicker. Also, using a tile cache allows
you to handle maps which exceed the size of the address space (i.e.
maps larger than 4GiB on a 32-bit system).

OTOH, r.proj does have reasonable locality of reference, so the
working set tends to be small relative to the total amount of data. I
don't know whether the same is true of r.watershed.

--
Glynn Clements <glynn@gclements.plus.com>

Markus Metz wrote:

The original version uses very little memory, so assuming that GRASS
runs today on systems where at least 500MB RAM are available I changed
the parameters for the seg mode, more data are kept in memory, speeding
up the seg mode. Looking at other modules using the segment library
(e.g. v.surf.contour, r.cost), it seems that there is not one universally
used setting, instead the segment parameters are tuned to each module.
The new settings work for me, but not necessarily for others, and maybe
using 500MB is a bit much.

fwiw r.terraflow has a memory= option, the default is 300mb.
AFAIU, the bigger you make that, the smaller the on-disk temp files need
to be (ie work-around to keep tmp files <2gb for 32bit filesystems).

a number of modules like r.in.poly have a rows= option, which I didn't
really understand until I got into the code. (hold at most that many
region rows (all columns) in memory at once). Interestingly the default
value has scaled quite well over the years.

and other modules like r.in.xyz have percent= (0-100) for how much of the
map to keep in memory at once.

For GRASS 7, a consistent approach (user option) among modules would be
nice to ease the learning curves.

Hamish

Hamish wrote:

and other modules like r.in.xyz have percent= (0-100) for how much of the
map to keep in memory at once.

I'm wondering if it would be worth adding a switch to r.in.xyz to
indicate that the points have been pre-sorted in descending order of
their Y coordinate (which can be done with "sort -nr"). In that
situation, you would be able to import the data in a single pass while
only holding a single row in memory.

I would expect implementing such a feature to be relatively
straightforward. Set rows=1, npasses=region.rows, skip the rewind() at
the beginning of each pass, and terminate each pass at the first point
below the current row. The only slight complication is that you need
to retain that point for the next pass.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:

Markus Metz wrote:
  

The original version uses very little memory, so assuming that GRASS
runs today on systems where at least 500MB RAM are available I changed
the parameters for the seg mode, more data are kept in memory, speeding
up the seg mode. Looking at other modules using the segment library
(e.g. v.surf.contour, r.cost), it seems that there is not one universally
used setting, instead the segment parameters are tuned to each module.
The new settings work for me, but not necessarily for others, and maybe
using 500MB is a bit much.
    
fwiw r.terraflow has a memory= option, the default is 300mb.
AFAIU, the bigger you make that, the smaller the on-disk temp files need
to be (ie work-around to keep tmp files <2gb for 32bit filesystems).

a number of modules like r.in.poly have a rows= option, which I didn't
really understand until I got into the code. (hold at most that many
region rows (all columns) in memory at once). Interestingly the default
value has scaled quite well over the years.

and other modules like r.in.xyz have percent= (0-100) for how much of the
map to keep in memory at once.
  

A default value that scales well over the years would be preferable, but
performance of r.watershed.fast -m is really poor if whole columns (or
rows ) are kept in memory and much better if segments have equal
dimensions. Interestingly, segments of 200 rows and 200 columns are
processed fastest, faster than e.g. 150 rows and columns or 250 rows and
columns. The more segments are kept in memory the better.
Right now I don't want to introduce a new option to give the user
control over how much memory is used (be it MB memory, number of rows or
percent of the map) because I want to keep all options of
r.watershed.fast identical to the original version. I'm still not happy
with the speed of the segmented version of r.watershed.fast, but at
least it is magnitudes faster than the in-memory version of the original
r.watershed. Maybe the iostream library that came with r.terraflow can
be used for r.waterhed -m as well.

Markus

Markus Metz wrote:

Right now I don't want to introduce a new option to give the user control
over how much memory is used (be it MB memory, number of rows or percent
of the map) because I want to keep all options of r.watershed.fast
identical to the original version.

adding new options is ok as it doesn't break compatibility. i.e. scripts
written to use the old version will still run fine and produce the same
output.

only removing and renaming options is frozen for GRASS 6. (but ok in GRASS 7 aka trunk/)

To gain access to the grass-addons SVN you will need to create yourself
an OSGeo id and send the name to the grass-psc mailing list, along with a
note that you have read and agree to RFC2.

Hamish

Markus Metz wrote:

Right now I don't want to introduce a new option to give the user
control over how much memory is used (be it MB memory, number of rows or
percent of the map) because I want to keep all options of
r.watershed.fast identical to the original version.

There's no reason to avoid adding new options, so long as you don't
remove or modify existing options, and choose reasonable default
behaviour in the case where the new option isn't used.

--
Glynn Clements <glynn@gclements.plus.com>