[GRASS-dev] ramdisk as mapset?

Hi,

* I have some scripts which are very heavy on raster map disk i/o.
   r.cost chugs heavily on the hard drive & the script can take days
   to loop through. I don't want to wear a hole in it if I don't have to.
* I have many GB of RAM to play with (enough to hold the region as DCELL)
* The raster modules typically don't use much ram at all. (low overheads
   to compete with for RAM)

I am trying to think of a way to get the raster ops to happen all in RAM
to save time & wear on the hard drive. (script spans a number of r.*
modules)

ideas so far:

1) [Linux] create a 2GB ramdisk using ramfs. use g.mapset to swich into
it, do the heavy i/o. switch back to the original mapset, g.copy the
results map back to the "real" mapset, then destroy the ramdisk.
advantages: easy to do.
disadvantages: it's more of a local hack than a general solution.

mkdir /mnt/ramdrive

# default max_size is 1/2 physical ram, auto-resizes 'til then
mount -t ramfs none /mnt/ramdrive
mkdir -p /mnt/ramdrive/tmp_mapset
TMP_MAPSET="/mnt/ramdrive/tmp_mapset"
ln -s "$TMP_MAPSET" $USER/grassdata/$LOCATION/tmp_mapset
cp $USER/grassdata/$LOCATION/$MAPSET/WIND "$TMP_MAPSET"
g.mapset mapset=tmp_mapset
...
g.module in=map@$MAPSET out=result
...
g.mapset mapset=$MAPSET
g.copy result@tmp_mapset,result
umount /mnt/ramdrive

problem: how to set group ID and mode/umask for ramdrive without
having to do chown+chmod as root?

2) Some backgrounded "grass_mapd" process to dynamically allocate and
hold a single map in memory. It's a child of the main GRASS process so
exiting GRASS tears it down. It could be a "virtual" map sort of like
how a reclass map is just a wrapper for something else. This is just a
very rough idea, probably not so easy to do; but if possible I reckon it
would be a cool tool to have.

anyone have ideas?

thanks,
Hamish

Hi Hamish,
interesting idea. But lets put this idea into the library of grass. :slight_smile:

I would like to suggest to extent the grass lib to
read the entire map into the memory if needed.
G_get_rast_row() will still work with no modification, but will
read the rows from memory and not from the disk.

And while implementing this, a tile caching mechanism would be good to.

IMHO we have to add new special --flags to allow every module to use this feature.
eg: --cache-all will read all tiles (in this case a tile is a row) into the memory.
--cache-size-y=<int> will set the number of rows which should be in the cache.

And in case we will have "real" tile's (x,y size), we need a flag like
--tile-size-x=<int> and --tile-size-y=<int> to set the size of each tile
which should be loaded into the cache. But we need to rewrite the modules
so they can benefit from this kind of tiles, because G_get_rast_row() needs extra overhead
to create a single row from different tiles.

IIRC that is approximately what Glynn suggested about a new implementation
of the raster map storage.

Just my 2 cent.

Sören

btw.:
A 3d tile cache mechanism is already present in the g3d lib.

-------- Original-Nachricht --------
Datum: Thu, 19 Jul 2007 19:14:29 +1200
Von: Hamish <hamish_nospam@yahoo.com>
An: grass5 <grass-dev@grass.itc.it>
Betreff: [GRASS-dev] ramdisk as mapset?

Hi,

* I have some scripts which are very heavy on raster map disk i/o.
   r.cost chugs heavily on the hard drive & the script can take days
   to loop through. I don't want to wear a hole in it if I don't have to.
* I have many GB of RAM to play with (enough to hold the region as DCELL)
* The raster modules typically don't use much ram at all. (low overheads
   to compete with for RAM)

I am trying to think of a way to get the raster ops to happen all in RAM
to save time & wear on the hard drive. (script spans a number of r.*
modules)

ideas so far:

1) [Linux] create a 2GB ramdisk using ramfs. use g.mapset to swich into
it, do the heavy i/o. switch back to the original mapset, g.copy the
results map back to the "real" mapset, then destroy the ramdisk.
advantages: easy to do.
disadvantages: it's more of a local hack than a general solution.

mkdir /mnt/ramdrive

# default max_size is 1/2 physical ram, auto-resizes 'til then
mount -t ramfs none /mnt/ramdrive
mkdir -p /mnt/ramdrive/tmp_mapset
TMP_MAPSET="/mnt/ramdrive/tmp_mapset"
ln -s "$TMP_MAPSET" $USER/grassdata/$LOCATION/tmp_mapset
cp $USER/grassdata/$LOCATION/$MAPSET/WIND "$TMP_MAPSET"
g.mapset mapset=tmp_mapset
...
g.module in=map@$MAPSET out=result
...
g.mapset mapset=$MAPSET
g.copy result@tmp_mapset,result
umount /mnt/ramdrive

problem: how to set group ID and mode/umask for ramdrive without
having to do chown+chmod as root?

2) Some backgrounded "grass_mapd" process to dynamically allocate and
hold a single map in memory. It's a child of the main GRASS process so
exiting GRASS tears it down. It could be a "virtual" map sort of like
how a reclass map is just a wrapper for something else. This is just a
very rough idea, probably not so easy to do; but if possible I reckon it
would be a cool tool to have.

anyone have ideas?

thanks,
Hamish

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hamish wrote:

* I have some scripts which are very heavy on raster map disk i/o.
   r.cost chugs heavily on the hard drive & the script can take days
   to loop through. I don't want to wear a hole in it if I don't have to.
* I have many GB of RAM to play with (enough to hold the region as DCELL)
* The raster modules typically don't use much ram at all. (low overheads
   to compete with for RAM)

I am trying to think of a way to get the raster ops to happen all in RAM
to save time & wear on the hard drive. (script spans a number of r.*
modules)

ideas so far:

1) [Linux] create a 2GB ramdisk using ramfs. use g.mapset to swich into
it, do the heavy i/o. switch back to the original mapset, g.copy the
results map back to the "real" mapset, then destroy the ramdisk.
advantages: easy to do.
disadvantages: it's more of a local hack than a general solution.

It's also an inefficient use of RAM. Bear in mind that the kernel will
automatically cache files; if you use them frequently enough, they'll
be in RAM anyhow, and creating a RAM disk reduces the amount of RAM
that the kernel can use for caching.

mkdir /mnt/ramdrive

# default max_size is 1/2 physical ram, auto-resizes 'til then
mount -t ramfs none /mnt/ramdrive
mkdir -p /mnt/ramdrive/tmp_mapset
TMP_MAPSET="/mnt/ramdrive/tmp_mapset"
ln -s "$TMP_MAPSET" $USER/grassdata/$LOCATION/tmp_mapset
cp $USER/grassdata/$LOCATION/$MAPSET/WIND "$TMP_MAPSET"
g.mapset mapset=tmp_mapset
...
g.module in=map@$MAPSET out=result
...
g.mapset mapset=$MAPSET
g.copy result@tmp_mapset,result
umount /mnt/ramdrive

problem: how to set group ID and mode/umask for ramdrive without
having to do chown+chmod as root?

Mounting filesytems inevitably requires the cooperation of root. You
can allow normal users to mount specific filesystems by adding them to
/etc/fstab with the "user" option; you can normally set the
permissions of the root directory there.

2) Some backgrounded "grass_mapd" process to dynamically allocate and
hold a single map in memory. It's a child of the main GRASS process so
exiting GRASS tears it down. It could be a "virtual" map sort of like
how a reclass map is just a wrapper for something else. This is just a
very rough idea, probably not so easy to do; but if possible I reckon it
would be a cool tool to have.

For programs which perform sequential I/O, you won't improve much on
the kernel's built in caching. If you have enough RAM, it will get
used.

A large proportion of the overhead is in the processing which occurs
between read() -> G_get_*_row() and G_put_*_row() -> write(), rather
than in the "actual" I/O (i.e. read() and write()). Creating
uncompressed maps would eliminate some of this; a better
implementation of nulls would also help.

For programs which perform random I/O (e.g. r.cost), consider
replacing the use of the segment library (which is rather poorly
implemented) with the segment code from r.proj.seg.

--
Glynn Clements <glynn@gclements.plus.com>

"Sören Gebbert" wrote:

interesting idea. But lets put this idea into the library of grass. :slight_smile:

I would like to suggest to extent the grass lib to
read the entire map into the memory if needed.

There's no advantage if you're doing sequential I/O, which is most of
GRASS.

Programs which do random I/O already have mechanisms to deal with
this; e.g. r.proj reads the entire useful region into memory,
r.proj.seg has its own tile cache, other programs (e.g. r.cost) use
the segment library.

If you are concerned about efficiency, build GRASS with profiling
support, and look at where the actual inefficiency lies. It would help
to do the same for 4.x, as that is reported to be an order of
magnitude faster than 5.x/6.x for some tasks.

--
Glynn Clements <glynn@gclements.plus.com>

What about running GRASS in something like Xgrid or Grid Engine?

I am about to purchase a cluster of Mac Pros for filtering and
rendering sonar data and I have been curious what has been done to
parallelize GRASS buy enterprising people. Grass isn't threaded so it
might work well on a cluster if you can keep the nodes fed with data.
Maybe some Franken-script could divide the raster data into segments
and launch a node on each segment.

I read a paper recently that documented using GRASS on GeoWalls via a
series of shell scripts that controlled the batch processing, but I
haven't looked into it in detail and can't remember the authors off
the top of my head.

Our data sets are toping 100 million points a piece and I am looking
for ways to divide and conquer the work load. Currently, I use GMT
across 8 nodes and it works well (if a bit clunky as far as job
management goes).

David

On 7/19/07, Glynn Clements <glynn@gclements.plus.com> wrote:

"Sören Gebbert" wrote:

> interesting idea. But lets put this idea into the library of grass. :slight_smile:
>
> I would like to suggest to extent the grass lib to
> read the entire map into the memory if needed.

There's no advantage if you're doing sequential I/O, which is most of
GRASS.

Programs which do random I/O already have mechanisms to deal with
this; e.g. r.proj reads the entire useful region into memory,
r.proj.seg has its own tile cache, other programs (e.g. r.cost) use
the segment library.

If you are concerned about efficiency, build GRASS with profiling
support, and look at where the actual inefficiency lies. It would help
to do the same for 4.x, as that is reported to be an order of
magnitude faster than 5.x/6.x for some tasks.

--
Glynn Clements <glynn@gclements.plus.com>

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

--
David Finlayson, Ph.D.
Operational Geologist

U.S. Geological Survey
Pacific Science Center
400 Natural Bridges Drive
Santa Cruz, CA 95060, USA

Tel: 831-427-4757, Fax: 831-427-4748, E-mail: dfinlayson@usgs.gov

On 2007-07-19, at 09:14, Hamish wrote:

ideas so far:
1) [Linux] create a 2GB ramdisk using ramfs. use g.mapset to swich into
it, do the heavy i/o. switch back to the original mapset, g.copy the
results map back to the "real" mapset, then destroy the ramdisk.
advantages: easy to do.
disadvantages: it's more of a local hack than a general solution.

[...]

problem: how to set group ID and mode/umask for ramdrive without
having to do chown+chmod as root?

You can't. Well actually you could use sudo (preferably with nopasswd option) to chown+chmod the dir.
Apart from that I'd suggest using something from BSD family, but that's not that important :slight_smile:

You'll get a great speed boost - the problem with a lot of disk I/O operations isn't about caching (you could probably increase the cache size, but that's not it) but I/Os themselves. RAMdrive gets rid of this problem :slight_smile: Tested and working great (not with GRASS though :slight_smile: )

regards

Hamish wrote:
> * I have some scripts which are very heavy on raster map disk i/o.
> r.cost chugs heavily on the hard drive & the script can take days
> to loop through. I don't want to wear a hole in it if I don't
> have to.

..

> I am trying to think of a way to get the raster ops to happen all in
> RAM

..

> 1) [Linux] create a 2GB ramdisk using ramfs. use g.mapset to swich
> into it, do the heavy i/o.

..
Glynn:

It's also an inefficient use of RAM. Bear in mind that the kernel will
automatically cache files; if you use them frequently enough, they'll
be in RAM anyhow, and creating a RAM disk reduces the amount of RAM
that the kernel can use for caching.

True. I could do (pseudo) `cat -r $MAPSET/*`, but that's no faster than
just letting the kernel do it itself as it happens.

the ramdisk helps train the caching ahead of time. next time I do a big
r.cost loop I might experiment with this and see how much of a
difference
it makes.

Hamish:

> problem: how to set group ID and mode/umask for ramdrive without
> having to do chown+chmod as root?

Jakub Kulczynski wrote:

You can't. Well actually you could use sudo (preferably with nopasswd
option) to chown+chmod the dir.

Ok. I now see the `mount` man page says that ramfs has no mount options.
(slightly older 2.4 kernel + gnu) I did try setting in fstab; was
ignored.

Glynn:

For programs which perform random I/O (e.g. r.cost), consider
replacing the use of the segment library (which is rather poorly
implemented) with the segment code from r.proj.seg.

This is the heart of my wish really. The ramfs stuff is just a
workaround for that issue.

Dylan wrote:

I am about to purchase a cluster of Mac Pros for filtering and
rendering sonar data and I have been curious what has been done to
parallelize GRASS buy enterprising people.

Q: is the GRASS segmentation process inherently thread-friendly?
(I mean theoretically, not as written)

ie if the segmentation library was rewritten to be better, could it use
some sort of n_proc value set at compile time, (or (better) a gis var
set with g.gisenv, or even a shell enviro var) to determine how many
simultaneous processes to run at once?

Given our manpower, the best way I see to get GRASS more multi-core and
multi-processor ready is a piecemeal approach, starting will the low-
hanging fruit / biggest bang-for-buck libraries. Simultaneously we can
gradually remove as many global variables as possible from the libs.

I wonder if Thiery has any thoughts here, as he is probably in a better
position to fundamentally & quickly rework the architecture than we are.
(ie less baggage to worry about) I think it is very safe to say that for
the next decade or so multi-core scaling is going to be the future of
number crunching. Eventually new paradigms and languages will arrive, but
for now we have to fight with making our serial languages thread-safe....

some sort of plan of action, in order of priority:
1) [if plausible] Make the segment lib multi-proc'able. If it's currently
   crappy, then all the more reason to start rewrites here.
2) Work on quad-tree stuff (v.surf.*, r.terraflow) individually (???)
3) Create new row-by-row libgis fns and start migrating r.modules, 1 by 1.
   (what would the module logic look like instead of two for loops?)
4) I don't know, but suspect, MPIing vector ops will be much much harder.

After the segment lib & one-offs, the next big multi-proc task I see is
the row-by-row raster ops. This of course means replacing
G_{get|put}_*_row() in the raster modules with a more abstract method.
Then, in some new libgis fn, splitting the map up into n_proc parts and
applying the operation to each. Worry about multi-row r.neighbors etc
later? This is getting near to writing r.mapcalc as a lib fn. (!)
I wonder if the python-C SWIG interface helps with prototyping?
Then slowly move as many of the 150 raster modules to the new MPI-aware
lib fns as are suited for it, one by one. Again I think the low-hanging
fruit will be obvious and the most important modules (r.mapcalc, r.cost)
will be taken care of first, and the lesser used raster modules on a needs
basis by contributors. (as long as we offer a clean API method)

I've read that "n" in 'make -j n' should be n_procs + 1. Is that just
true for quick little processes where you always want a job ready at the
door and there's a lot of overhead creating & destroying the process?

thoughts?
Hamish

Glynn Clements wrote:

If you are concerned about efficiency, build GRASS with profiling
support, and look at where the actual inefficiency lies. It would help
to do the same for 4.x, as that is reported to be an order of
magnitude faster than 5.x/6.x for some tasks.

Could you suggest a quick usage guide as to how to do that?
What's the best tool?
* 'gcc -pg' + gprof?
* valgrind's calltree -> kcachegrind?
    (calltree doesn't need a special recompile)
How to look at the output?

Preferably compose directly to one of the wiki pages: (or make a new one)
  http://grass.gdf-hannover.de/wiki/GRASS_Debugging
  http://grass.gdf-hannover.de/wiki/Development

thanks,
Hamish

Hi,

Dylan wrote:
> I am about to purchase a cluster of Mac Pros for filtering and
> rendering sonar data and I have been curious what has been done to
> parallelize GRASS buy enterprising people.

Q: is the GRASS segmentation process inherently thread-friendly?
(I mean theoretically, not as written)

Code which should run on a cluster must not be thread safe,
unless you are using a single system image (SSI) linux with
distributed thread support (SGI Altix series). Most cluster do not
support thread spreading to different cluster nodes (the network connection
is in most cases the limiting factor -> exception: take a look at numa links
from SGI).

I prefer threaded parallelism, because it is easier to implement and we do not
need to handle with message passing overhead. But this code will not run on a
cluster (unless you use OpenMP and the Intel OpenMP on Cluster compiler
extension).

ie if the segmentation library was rewritten to be better, could it use
some sort of n_proc value set at compile time, (or (better) a gis var
set with g.gisenv, or even a shell enviro var) to determine how many
simultaneous processes to run at once?

A variable would be the best. The gpde library uses OpenMP to run
some tasks in parallel. The number of threads can be controlled via the
environment variable OMP_NUM_THREADS.

But why multi-threading the segment library?
IMHO it is currently not useful to
mult-thread io operations. IO is mostly serial (except cluster fs).

Given our manpower, the best way I see to get GRASS more multi-core and
multi-processor ready is a piecemeal approach, starting will the low-
hanging fruit / biggest bang-for-buck libraries. Simultaneously we can
gradually remove as many global variables as possible from the libs.

I currently design the N_array stuff within the gpde library from scratch
to support multi threaded processing of raster and volume data loaded
into the memory. For now some higher level array functions are
implemented which are using OpenMP to speed up some tasks.

The current N_array implementation only support the 3 data types of grass
and is not designed for performance, but for easy usage. This may change
in the future.

Future task are:

* Use a more abstact approach for the N_array struct handling
** 1d, 2d and 3d arrays should be managed as one structure
** Use function pointer and member functions for a more OO like approach
** A flag decides the type of the array -> easy conversion of 1d into 2d or 3d
    arrays
** the access member function will be set while allocating an array
    eg: 1d array: double array_1d->get_d_value(array_1d, col);
          2d array: double array_2d->get_d_value(array_2d, col, row);
          3d array: double array_3d->get_d_value(array_3d, col, row, depth);
    and so on ...
** support for data references in the internal data structure
     eg: setting an already allocated raster row buffer as data pointer for an
           1d array, in case the array is deleted (free) the buffer will not
           be freed
* Implement new data types into the N_array library
** unsigned char, signed char, unsigned short, signed short, unsigned int,
    signed int, float, double
* create a more abstact interface to 2d and 3d raster data
** implementation of so called "data sources" in 2d and 3d
** data sources will have member functions to access the raster and
     volume data eg:
    double data_source_2d->get_d_value(data_source_2d, col, row);
    N_array * data_source_2d->get_row(data_source_2d, row);
    N_array * data_source_3d->get_tile(data_source_3d, x, y, z);
    and so on
* High level functions like:
** array copy; statistic calculation of an array (mean, max, min, ...);
     sorting, basic mathematical tasks like array substraction, addition,
     multiplication, division, modulo and so on should be implemented
     multi threaded (take a look at N_arrays_calc.c for current
     implementations)
** Neighbourhood searching routines should be implemented using N_arrays
    eg: N_array * array_2d->get_neighbours(array_2d, row, col, size)

Some of these functionality are already implemented and tested in the gpde
lib.

I wonder if Thiery has any thoughts here, as he is probably in a better
position to fundamentally & quickly rework the architecture than we are.
(ie less baggage to worry about) I think it is very safe to say that for
the next decade or so multi-core scaling is going to be the future of
number crunching. Eventually new paradigms and languages will arrive, but
for now we have to fight with making our serial languages thread-safe....

Indeed.

some sort of plan of action, in order of priority:
1) [if plausible] Make the segment lib multi-proc'able. If it's currently
   crappy, then all the more reason to start rewrites here.
2) Work on quad-tree stuff (v.surf.*, r.terraflow) individually (???)

AFAIC the quad-tree stuff implemented in v.surf.rst is not usable for raster
data storage or handling.

3) Create new row-by-row libgis fns and start migrating r.modules, 1 by 1.
   (what would the module logic look like instead of two for loops?)
4) I don't know, but suspect, MPIing vector ops will be much much harder.

After the segment lib & one-offs, the next big multi-proc task I see is
the row-by-row raster ops. This of course means replacing
G_{get|put}_*_row() in the raster modules with a more abstract method.

I would like to suggest to implement the raster row and tile handling in a new
library called Gdata_ which should implement the functionality i explained
above.
The abstact Gdata interface should be able to handle different storage
implementations (current raster storage, segemt and rowio lib,
an interface to gdal, g3d lib ...) with the data_source approach.

Then, in some new libgis fn, splitting the map up into n_proc parts and
applying the operation to each. Worry about multi-row r.neighbors etc
later? This is getting near to writing r.mapcalc as a lib fn. (!)

Indeed.

Best regards
Soeren

I wonder if the python-C SWIG interface helps with prototyping?
Then slowly move as many of the 150 raster modules to the new MPI-aware
lib fns as are suited for it, one by one. Again I think the low-hanging
fruit will be obvious and the most important modules (r.mapcalc, r.cost)
will be taken care of first, and the lesser used raster modules on a needs
basis by contributors. (as long as we offer a clean API method)

I've read that "n" in 'make -j n' should be n_procs + 1. Is that just
true for quick little processes where you always want a job ready at the
door and there's a lot of overhead creating & destroying the process?

thoughts?
Hamish

_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev

Hamish wrote:

> If you are concerned about efficiency, build GRASS with profiling
> support, and look at where the actual inefficiency lies. It would help
> to do the same for 4.x, as that is reported to be an order of
> magnitude faster than 5.x/6.x for some tasks.

Could you suggest a quick usage guide as to how to do that?
What's the best tool?
* 'gcc -pg' + gprof?
* valgrind's calltree -> kcachegrind?
    (calltree doesn't need a special recompile)

I trust gcc/gprof more than valgrind, particularly where most of the
CPU time is in "leaf" functions (i.e. the module making many calls to
trivial functions in shared libraries, e.g. G_is_?_null_value()).

However, it's harder to use, and can't trace into shared libraries (or
any libraries built without profiling support), so you have to build
GRASS with --disable-shared.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:

> I am about to purchase a cluster of Mac Pros for filtering and
> rendering sonar data and I have been curious what has been done to
> parallelize GRASS buy enterprising people.

Q: is the GRASS segmentation process inherently thread-friendly?
(I mean theoretically, not as written)

ie if the segmentation library was rewritten to be better, could it use
some sort of n_proc value set at compile time, (or (better) a gis var
set with g.gisenv, or even a shell enviro var) to determine how many
simultaneous processes to run at once?

No. You can't avoid modifying code.

Given our manpower, the best way I see to get GRASS more multi-core and
multi-processor ready is a piecemeal approach, starting will the low-
hanging fruit / biggest bang-for-buck libraries. Simultaneously we can
gradually remove as many global variables as possible from the libs.

It isn't just global variables (although those are a major issue, not
just for threading).

For most applications, you would want to be able to have multiple
threads reading/writing the input/output maps, so the issue also
applies to the fields of "struct fileinfo".

I wonder if Thiery has any thoughts here, as he is probably in a better
position to fundamentally & quickly rework the architecture than we are.
(ie less baggage to worry about) I think it is very safe to say that for
the next decade or so multi-core scaling is going to be the future of
number crunching. Eventually new paradigms and languages will arrive, but
for now we have to fight with making our serial languages thread-safe....

some sort of plan of action, in order of priority:
1) [if plausible] Make the segment lib multi-proc'able. If it's currently
   crappy, then all the more reason to start rewrites here.

The segment library isn't really helpful here. It's essentially just a
home-grown virtual memory system. AFAICT, the only advantage over the
OS' virtual memory is that you aren't limited to (at most) 4GiB on a
32-bit system. On a 64-bit system, you may as well just read the
entire map into (virtual) memory.

BTW, unless the module explicitly opens the segment file in 64-bit
mode (LFS), it will hit the 2GiB file limit. Many systems have more
than 2GiB of RAM, so using the segment library may actually reduce the
maximum size of a map compared to just reading it into memory.

2) Work on quad-tree stuff (v.surf.*, r.terraflow) individually (???)
3) Create new row-by-row libgis fns and start migrating r.modules, 1 by 1.
   (what would the module logic look like instead of two for loops?)

If the complexity is in the algorithm, then there's no alternative to
restructuring the code. Obviously, this requires that the algorithm is
actually parallelisable.

There are a few things which can be done in libraries, e.g. using
separate threads for {get,put}_row operations, so that the actual
algorithm gets a complete CPU core to itself.

As for parallelising individual modules, r.mapcalc would be an obvious
priority. It's current structure isn't conducive to that (the buffers
are stored in the expression nodes), but that isn't too hard to
change. There's still the issue that the {get,put}_row operations have
to be serialised, so you won't be able to process data faster than a
single core can read/write a map. Still, for complex calculations, it
might be worth the effort.

4) I don't know, but suspect, MPIing vector ops will be much much harder.

The main issue is likely to be the difficulty of making the output
operations thread-safe. Making read operations thread-safe is usually
simple (if you have a read "cursor", that needs to be updated
atomically).

After the segment lib & one-offs, the next big multi-proc task I see is
the row-by-row raster ops. This of course means replacing
G_{get|put}_*_row() in the raster modules with a more abstract method.
Then, in some new libgis fn, splitting the map up into n_proc parts and
applying the operation to each. Worry about multi-row r.neighbors etc
later? This is getting near to writing r.mapcalc as a lib fn. (!)

Most "filters" can be parallelised easily enough. This includes those
which need a "window", e.g. r.neighbors; it doesn't matter if multiple
threads are reading the same row. You do need to make the rowio window
large enough to account for the number of active threads (e.g. if you
have 4 threads and a 5x5 window, you need at least 2+4+2 = 8 rows).

The main issue is that the core raster I/O needs to be made
thread-safe, including multiple threads using a single map. That means
either replacing the {work,null,mask,temp,compressed}_buf fields in
the fileinfo structure with an array of such buffers (one per thread),
or using automatic buffers (i.e. alloca(); does any supported platform
not provide this?).

--
Glynn Clements <glynn@gclements.plus.com>