[GRASS-user] "Parallelization" of Python Script

Hamish wrote:

for an example of grass.start_command() for parallelizing a bunch
of r.cost runs, see v.surf.icw(.py) in grass7 addons:
https://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.surf.icw/v.surf.icw.py

Johannes:

thank you for that example. I think it explains it very well how it
works to assign multiple r.cost runs to single processes with
grass.start_command. I am just wondering how it is done when there are
multiple consecutive processes

in the for loop. In your example (v.surf.icw.py) for each step (e.g.
r.cost (line 271), r.mapcalc (298)) an separate for loop is started...Is
there a way to combine the steps etc. in a function (e.g. combination
of r.cost and mapcalc) and launch that function in a way like
grass.start_command in a single loop?
If possible that would probably save code lines and might be a little
more clear (at least to me).

I am just asking because one of my skripts which is still in "serial
mode" involves lots of steps inside the for loop.

This would create in parallel at least a dozen for loops which might
appear very unclear.

ok, in s.surf.icw(.sh) for GRASS 5 and v.surf.icw(.sh) for GRASS 6 I had
it as one big loop, but for the GRASS 7 python version I made it into
a series of small loops to (a) use the simpler grass_start() single command
method, and (b) get rid of the temp maps ASAP since that module makes a
lot of them and it adds a lot of disk I/O lag if they get flushed to
the hard drive before they are removed. In the icw case most of the time
was taken by r.cost compared to the renaming and preprocessing bits of
the (former) big loop.

for parallelizing an entire function in Python as you want, there's a
method in grass7's i.landsat.rgb(.py) to look at that uses mp.Process.
It's a bit more work since you have to manually ensure that the I/O pipes
get closed.
  https://trac.osgeo.org/grass/browser/grass/trunk/scripts/i.landsat.rgb/i.landsat.rgb.py

note the above script preserves the serial execution method intact (to
make the imagery method easier to learn), so has ~ double the code than it
actually needs. But I think using the extra wrapper function makes the
real guts of the imagery algorithm easier to read, understand, and maintain,
and so keeping all the ugly parallelization stuff away is a good thing.

Hamish

2012/8/7 Hamish <hamish_b@yahoo.com>

Hamish wrote:

for parallelizing an entire function in Python as you want, there’s a
method in grass7’s i.landsat.rgb(.py) to look at that uses mp.Process.
It’s a bit more work since you have to manually ensure that the I/O pipes
get closed.
https://trac.osgeo.org/grass/browser/grass/trunk/scripts/i.landsat.rgb/i.landsat.rgb.py

note the above script preserves the serial execution method intact (to
make the imagery method easier to learn), so has ~ double the code than it
actually needs. But I think using the extra wrapper function makes the
real guts of the imagery algorithm easier to read, understand, and maintain,
and so keeping all the ugly parallelization stuff away is a good thing.

Another cool example. Although I would note that it’s very possible to embed the previous type of loop into methods or functions if you wish. I normally write object-oriented though and have heard that the multiprocessing library can only be used with functions because it has trouble passing the objects (although maybe explicitly specifying self would take care of that problem, I’ve never tried it). Does anybody have experience there?