[GRASS-dev] Re: grass-dev Digest, Vol 30, Issue 31

------------------------------

Message: 7
Date: Mon, 13 Oct 2008 09:12:54 +0200
From: Markus Metz <markus_metz@gmx.de>
Subject: [GRASS-dev] Re: big region r.watershed
To: hamish_b@yahoo.com, grass-dev@lists.osgeo.org
Message-ID: <48F2F4F6.1010006@gmx.de>
Content-Type: text/plain; charset=ISO-8859-1

Hamish wrote:

Markus Metz wrote:

The original version uses very little memory, so assuming that GRASS
runs today on systems where at least 500MB RAM are available I changed
the parameters for the seg mode, more data are kept in memory, speeding
up the seg mode. Looking at other modules using the segment library
(e.g. v.surf.contour, r.cost), it seems that there is not one universally
used setting, instead the segment parameters are tuned to each module.
The new settings work for me, but not necessarily for others, and maybe
using 500MB is a bit much.

fwiw r.terraflow has a memory= option, the default is 300mb.
AFAIU, the bigger you make that, the smaller the on-disk temp files need
to be (ie work-around to keep tmp files <2gb for 32bit filesystems).

a number of modules like r.in.poly have a rows= option, which I didn't
really understand until I got into the code. (hold at most that many
region rows (all columns) in memory at once). Interestingly the default
value has scaled quite well over the years.

and other modules like r.in.xyz have percent= (0-100) for how much of the
map to keep in memory at once.

A default value that scales well over the years would be preferable, but
performance of r.watershed.fast -m is really poor if whole columns (or
rows ) are kept in memory and much better if segments have equal
dimensions. Interestingly, segments of 200 rows and 200 columns are
processed fastest, faster than e.g. 150 rows and columns or 250 rows and
columns. The more segments are kept in memory the better.
Right now I don't want to introduce a new option to give the user
control over how much memory is used (be it MB memory, number of rows or
percent of the map) because I want to keep all options of
r.watershed.fast identical to the original version. I'm still not happy
with the speed of the segmented version of r.watershed.fast, but at
least it is magnitudes faster than the in-memory version of the original
r.watershed. Maybe the iostream library that came with r.terraflow can
be used for r.waterhed -m as well.

Markus

To use the Iostream library you need to change the underlying algorithm of watershed. Iostream implements streams (files on disk) and sorting streams. If you use Iostream you need to store the grids in streams on disk, rather than 2d-arrays in memory. On streams random access is very expensive, so you need a way to express the computation as a sequence of sorting streams followed by sequential accesses to streams. This usually requires a complete rewrite of the algorithm.

-Laura