Hi,
Dylan wrote:
> I am about to purchase a cluster of Mac Pros for filtering and
> rendering sonar data and I have been curious what has been done to
> parallelize GRASS buy enterprising people.
Q: is the GRASS segmentation process inherently thread-friendly?
(I mean theoretically, not as written)
Code which should run on a cluster must not be thread safe,
unless you are using a single system image (SSI) linux with
distributed thread support (SGI Altix series). Most cluster do not
support thread spreading to different cluster nodes (the network connection
is in most cases the limiting factor -> exception: take a look at numa links
from SGI).
I prefer threaded parallelism, because it is easier to implement and we do not
need to handle with message passing overhead. But this code will not run on a
cluster (unless you use OpenMP and the Intel OpenMP on Cluster compiler
extension).
ie if the segmentation library was rewritten to be better, could it use
some sort of n_proc value set at compile time, (or (better) a gis var
set with g.gisenv, or even a shell enviro var) to determine how many
simultaneous processes to run at once?
A variable would be the best. The gpde library uses OpenMP to run
some tasks in parallel. The number of threads can be controlled via the
environment variable OMP_NUM_THREADS.
But why multi-threading the segment library?
IMHO it is currently not useful to
mult-thread io operations. IO is mostly serial (except cluster fs).
Given our manpower, the best way I see to get GRASS more multi-core and
multi-processor ready is a piecemeal approach, starting will the low-
hanging fruit / biggest bang-for-buck libraries. Simultaneously we can
gradually remove as many global variables as possible from the libs.
I currently design the N_array stuff within the gpde library from scratch
to support multi threaded processing of raster and volume data loaded
into the memory. For now some higher level array functions are
implemented which are using OpenMP to speed up some tasks.
The current N_array implementation only support the 3 data types of grass
and is not designed for performance, but for easy usage. This may change
in the future.
Future task are:
* Use a more abstact approach for the N_array struct handling
** 1d, 2d and 3d arrays should be managed as one structure
** Use function pointer and member functions for a more OO like approach
** A flag decides the type of the array -> easy conversion of 1d into 2d or 3d
arrays
** the access member function will be set while allocating an array
eg: 1d array: double array_1d->get_d_value(array_1d, col);
2d array: double array_2d->get_d_value(array_2d, col, row);
3d array: double array_3d->get_d_value(array_3d, col, row, depth);
and so on ...
** support for data references in the internal data structure
eg: setting an already allocated raster row buffer as data pointer for an
1d array, in case the array is deleted (free) the buffer will not
be freed
* Implement new data types into the N_array library
** unsigned char, signed char, unsigned short, signed short, unsigned int,
signed int, float, double
* create a more abstact interface to 2d and 3d raster data
** implementation of so called "data sources" in 2d and 3d
** data sources will have member functions to access the raster and
volume data eg:
double data_source_2d->get_d_value(data_source_2d, col, row);
N_array * data_source_2d->get_row(data_source_2d, row);
N_array * data_source_3d->get_tile(data_source_3d, x, y, z);
and so on
* High level functions like:
** array copy; statistic calculation of an array (mean, max, min, ...);
sorting, basic mathematical tasks like array substraction, addition,
multiplication, division, modulo and so on should be implemented
multi threaded (take a look at N_arrays_calc.c for current
implementations)
** Neighbourhood searching routines should be implemented using N_arrays
eg: N_array * array_2d->get_neighbours(array_2d, row, col, size)
Some of these functionality are already implemented and tested in the gpde
lib.
I wonder if Thiery has any thoughts here, as he is probably in a better
position to fundamentally & quickly rework the architecture than we are.
(ie less baggage to worry about) I think it is very safe to say that for
the next decade or so multi-core scaling is going to be the future of
number crunching. Eventually new paradigms and languages will arrive, but
for now we have to fight with making our serial languages thread-safe....
Indeed.
some sort of plan of action, in order of priority:
1) [if plausible] Make the segment lib multi-proc'able. If it's currently
crappy, then all the more reason to start rewrites here.
2) Work on quad-tree stuff (v.surf.*, r.terraflow) individually (???)
AFAIC the quad-tree stuff implemented in v.surf.rst is not usable for raster
data storage or handling.
3) Create new row-by-row libgis fns and start migrating r.modules, 1 by 1.
(what would the module logic look like instead of two for loops?)
4) I don't know, but suspect, MPIing vector ops will be much much harder.
After the segment lib & one-offs, the next big multi-proc task I see is
the row-by-row raster ops. This of course means replacing
G_{get|put}_*_row() in the raster modules with a more abstract method.
I would like to suggest to implement the raster row and tile handling in a new
library called Gdata_ which should implement the functionality i explained
above.
The abstact Gdata interface should be able to handle different storage
implementations (current raster storage, segemt and rowio lib,
an interface to gdal, g3d lib ...) with the data_source approach.
Then, in some new libgis fn, splitting the map up into n_proc parts and
applying the operation to each. Worry about multi-row r.neighbors etc
later? This is getting near to writing r.mapcalc as a lib fn. (!)
Indeed.
Best regards
Soeren
I wonder if the python-C SWIG interface helps with prototyping?
Then slowly move as many of the 150 raster modules to the new MPI-aware
lib fns as are suited for it, one by one. Again I think the low-hanging
fruit will be obvious and the most important modules (r.mapcalc, r.cost)
will be taken care of first, and the lesser used raster modules on a needs
basis by contributors. (as long as we offer a clean API method)
I've read that "n" in 'make -j n' should be n_procs + 1. Is that just
true for quick little processes where you always want a job ready at the
door and there's a lot of overhead creating & destroying the process?
thoughts?
Hamish
_______________________________________________
grass-dev mailing list
grass-dev@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass-dev