Can someone discuss the state of shared and distributed memory parallelization of GRASS?
Don Gunning
Software Program Manager
Technical computing group
Developer Product Division
Intel Corporation
217 403 4213
Can someone discuss the state of shared and distributed memory parallelization of GRASS?
Don Gunning
Software Program Manager
Technical computing group
Developer Product Division
Intel Corporation
217 403 4213
Hi Don,
you may have a look at:
http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs
http://grasswiki.osgeo.org/wiki/OpenMP
What i know personally:
We have shared memory parallelization implemented in the grass
mathematical library (gmath) and partial differential equation library
(gpde) using OpenMP. Blas level 2 and 3 functions as well as many
direct and iterative linear equation solver are parallelized as well
as the creation of linear equation systems. Several modules make use
of these libraries.
Pthreads are used to parallelize parts of r.mapcalc.
IIRC there are MPI parallelized versions of several GRASS modules
available in the internet, but not directly in GRASS.
We have parallelization on process level without message passing, by
simply running several processes on different data in parallel.
Several modules of the GRASS GIS Temporal Framework in GRASS7 make use
of this approach and of course the grid engine -> cluster
parallelization.
Cloud level parallelization using web service WPS is available because
GRASS7 can be run as WPS backend.
Best regards
Soeren
2013/8/30 Gunning, Don <don.gunning@intel.com>:
Can someone discuss the state of shared and distributed memory
parallelization of GRASS?Don Gunning
Software Program Manager
Technical computing group
Developer Product Division
Intel Corporation
1906 Fox Dr
Champaign Il 61820
217 403 4213
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev
Thanks Soren.
Do you know if there is much interest in greater parallelization? And have the Intel compilers and MPI been used with GRASS?
Don
-----Original Message-----
From: Sören Gebbert [mailto:soerengebbert@googlemail.com]
Sent: Friday, August 30, 2013 11:56 AM
To: Gunning, Don
Cc: grass-dev@lists.osgeo.org
Subject: Re: [GRASS-dev] Grass parallelization
Hi Don,
you may have a look at:
http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs
http://grasswiki.osgeo.org/wiki/OpenMP
What i know personally:
We have shared memory parallelization implemented in the grass
mathematical library (gmath) and partial differential equation library
(gpde) using OpenMP. Blas level 2 and 3 functions as well as many
direct and iterative linear equation solver are parallelized as well
as the creation of linear equation systems. Several modules make use
of these libraries.
Pthreads are used to parallelize parts of r.mapcalc.
IIRC there are MPI parallelized versions of several GRASS modules
available in the internet, but not directly in GRASS.
We have parallelization on process level without message passing, by
simply running several processes on different data in parallel.
Several modules of the GRASS GIS Temporal Framework in GRASS7 make use
of this approach and of course the grid engine -> cluster
parallelization.
Cloud level parallelization using web service WPS is available because
GRASS7 can be run as WPS backend.
Best regards
Soeren
2013/8/30 Gunning, Don <don.gunning@intel.com>:
Can someone discuss the state of shared and distributed memory
parallelization of GRASS?Don Gunning
Software Program Manager
Technical computing group
Developer Product Division
Intel Corporation
1906 Fox Dr
Champaign Il 61820
217 403 4213
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev
Don wrote:
Do you know if there is much interest in greater parallelization?
Huge interest. Outside of the core libraries, GRASS is made up of
~500 individual processing modules, each doing their own thing well.
Each utilizes their own algorithms and strategies which is why GRASS 7
has built-in support for OpenMP, pthreads, *and* OpenCL- the idea is that
the right parallelization strategy can be matched to the nature of the
problem which each individual module faces. Additionally our python
scripting library has helper functions to make parallel discrete-processes
easy to use, since a common use case is to run the same computation
on three different Red,Green,Blue imagery bands, or all ~7-11 spectral
bands from satellite data (e.g. LANDSAT). In those cases the number of
natural processes are in the same neighborhood as the number of cores
on a typical workstation, so backgrounding all but one of the jobs in
bash or python then waiting for them all to finish works remarkably well
and takes minimal programming effort and divergence from the single-thread
case. That's not far from the MPI situation, instead of backgrounding
jobs they could just as well be sent to other machines in the cluster.
As Soeren mentioned the gmath and gpde libraries support OpenMP already;
in addition Seth Price put together an OpenCL version of our r.sun
module (GPU ray tracing sunbeams, seems like a natural fit!) but I/we
still need to finish integrating that into the main build; and our
r.mapcalc module has pthreads support. The r.mapcalc (raster array map
calculator) case is a pretty typical one for GRASS modules actually, they
are not entirely, but not far from, being I/O limited not CPU bound per se.
For MPI this means that there's a *lot* of data to pass around the network,
and unless you've got infiniband or some network infrastructure near to
the speed of your RAID, I suspect you'll quickly saturate.
The main highly-CPU-bound modules I am personally very keen to see get
parallelized are our spline interpolation modules: v.surf.rst and
v.surf.bspline. The LU decomposition parts of them are actually in the
GRASS libraries not the modules, so that would also benefit e.g. v.vol.rst
which does 3D voxel cube interpolations. The v.surf.rst module uses quadtree
segmentation, and v.surf.bspline does its own splitting into ~ 12-120
processing segments, so those yell out to me as low hanging fruit.
I am sure the vector network analysis modules could make good use of
parallelization too, but I don't use those enough personally to be able
to comment on their immediate needs and bottlenecks.
Markus N. might be able to talk about what he's doing on the Top500
supercomputer (AIX); I'm not sure how much Maui/Torque or similar is
handling the job submissions there and how much is manual scripting
to break up/send out the jobs and then process the results.
And have the Intel compilers and MPI been used with GRASS?
Yes, I've built GRASS with icc ver 12.1.3 (-O2 -xHost -ipo -static-intel
-parallel -Wall). Considering the size of the GRASS codebase it might
be a little surprising that there weren't more problems :), but we do
try pretty hard to keep the code straight ANSI/C89 C, which helps a lot
with portability. For GRASS + icc build notes see:
http://thread.gmane.org/gmane.comp.gis.grass.devel/47823
For GRASS I generally need to keep a close eye on the Debian packaging,
so typically build it will gcc; outside of GRASS for I do use ifort a
lot, and there the OpenMP auto-vectorization works really great. I
understand that's a bit easier to do for FORTRAN than C though.
As for MPI, there's a MPI version of the above mentioned v.surf.rst module
for GRASS 5 floating around somewhere (probably under its old name of
's.surf.rst'); I actually run a medium sized cluster in my day job which
is ~85% MPI usage, but I've never really been tempted to use it for GRASS
things.. for what I personally do often saturating all cores/CPUs on the
local workstation is enough. Also, the cluster setup can be non-trival for
new users (NFS mounts, ssh keys, etc..), so out-of-the-box "just works"
OpenMP style multi-threading probably gets us better bang for the buck
when trying to support the 'Desktop GIS' user case, which is probably the
bulk of our end users. But don't get me wrong, if the long-running modules
like the spline interpolations and the r.cost module for search-paths were
MPI-enabled they'd certainly get used by our power users & teams using it
for back-end server satellite image number crunching!
regards,
Hamish