Sören Gebbert wrote:
there was a discussion on IRC how to parallelize grass to run on
multicore computers.
We came to the same conclusion as Glynn.
The design of the GRASS core libs
is not thread safe. And to port the core of grass to run in parallel on
Beowulf clusters or SSI clusters is not practical.
But GRASS can work in parallel already.
Just start several modules on several maps and hope the disk IO is
not the limiting factor.
... and hope that the modules don't try to modify shared data (e.g.
the WIND file, $GISRC, etc) concurrently.
Certainly, multiple concurrent processes is the easiest way to take
advantage of multiple cores. This is easier to do for server
environments (e.g. web applications) than for "desktop" use.
And this works on SSI clusters as well.
I think we should think more simple. Since gcc version 4.2, OpenMP is
available for normal C/C++/Fortran developers without access to
commercial compilers.
Individual modules can be parallelized for multicore systems with OpenMP
(http://www.openmp.org).
And i think to work with OpenMP is much easier than posix threads, MPI,
BSP or SunGrid and stuff.
But the main issues with GRASS and OpenMP are:
1.) You need some parallel programming experience or your programs will
become slower in the end with OpenMP
2.) To debug multi threaded software is much harder than single thread
software.
Not to mention writing it in the first place.
Personally, I would rule out writing multi-threading modules simply on
the basis that most GRASS developers don't have enough experience in
concurrent programming.
[Programming experience in general is a relatively limited commodity
around here. Most GRASS developers are geoscientists with the kind and
level of programming experience normally acquired in scientific
disciplines.]
Writing (reliable) concurrent code is hard. Worse still, most testing
methodologies are poor at detecting concurrency-related bugs. Asking
relatively inexperienced programmers to write concurrent code is a
recipe for software which passes the test suite then regularly suffers
from intermittent (and usually non-repeatable) failures in actual use.
3.) The core libs are not thread safe, so you have to careful choose the
library functions which are to use in parallel
IOW, you have to restrict the use of GRASS to one thread, with other
threads being used for "pure" computation. Unfortunately, most GRASS
code doesn't readily fit into that kind of model.
It would be more viable for certain computationally intensive modules,
particularly those which read an entire map into memory, do a lot of
processing, then write out results at the end.
Reading entire maps into memory is something we normally try to avoid,
and only use where the algorithm inherently requires random access (so
can't readily be adapted to a sequential-I/O strategy).
4.) The data access to the harddisk dont scale in parallel on a single
disk/conroler system, you will only benfit from OpenMP if you load
the data to process completely in the memory
5.) ...
So there are only a few modules which will benefit from simple
parallelizing with OpenMP.
To change this, the core of grass have to be implemented in
a new and different way.
But i think the grass developers have to think about to parallelize
grass. Because the future computer generations will work with multiple
cores (2 - 128?) and the users will expect that programms like grass
will use the multicore power to work. This is not an easy task and needs
a lot of programming knowledge ... well this can be added to the wish
section for grass version 10?
Version 10 sounds about right.
A lot of the problem is that:
1. The libraries can't readily be parallelised without changing the API.
2. Changing the API means re-writing modules which use it.
3. Much of GRASS' value is in the modules, so re-writing the modules
equates to re-writing most of GRASS.
There might be some specific cases which are amenable to
parallelisation. E.g. it might be possible to re-write the core raster
I/O to use threads in a producer-consumer model, so that
get-row/put-row operations essentially take no time (i.e. the module
runs entirely in the main thread, while a separate thread performs the
raster I/O). That might give a 2x speed-up on a dual-core system, but
still wouldn't scale to larger numbers of cores (i.e. you would still
only get a 2x speed-up on a 16-core system).
--
Glynn Clements <glynn@gclements.plus.com>