[GRASS-dev] multi-threaded r.mapcalc

I have added experimental multi-threading support to r.mapcalc in 7.0
(r34440). It isn't enabled by default; to enable it, use
"make USE_PTHREAD=1". It only uses a handful of pthread functions, so
there's a reasonable chance of it working on MacOSX and/or Windows
(with the pthread-win32 library).

You can change the number of worker threads by setting the WORKERS
environment variable.

The parallelism is limited to evaluation of argument expressions to
functions and operators. It doesn't attempt to calculate multiple rows
concurrently, or multiple output maps.

Also, only one thread can call get_map_row() at a time, so commands
which are I/O bound (i.e. which only perform simple calculations, so
most of the time is spent performing I/O), won't see much of an
improvement.

This is mainly due to the fact that the libgis raster I/O functions
aren't re-entrant, due to the use of pre-allocated row buffers (and
possibly other issues).

[There are also static row buffers in r.mapcalc itself, but those
could easily be eliminated if there was any point.]

Finally, eval() is excluded from parallelisation, as it is used for
variable assignments, and a variable needs to be assigned before it is
used. If this is significant, we can have separate parallel and
sequential versions. If you perform variable assignments other than in
eval(), you lose.

Mostly, I'm interested in discovering whether this actually has any
practical benefit. IOW, whether anyone is actually using scripts which
do enough computation to make use of additional cores.

--
Glynn Clements <glynn@gclements.plus.com>

On Friday 21 November 2008, Glynn Clements wrote:

I have added experimental multi-threading support to r.mapcalc in 7.0
(r34440). It isn't enabled by default; to enable it, use
"make USE_PTHREAD=1". It only uses a handful of pthread functions, so
there's a reasonable chance of it working on MacOSX and/or Windows
(with the pthread-win32 library).

You can change the number of worker threads by setting the WORKERS
environment variable.

The parallelism is limited to evaluation of argument expressions to
functions and operators. It doesn't attempt to calculate multiple rows
concurrently, or multiple output maps.

Also, only one thread can call get_map_row() at a time, so commands
which are I/O bound (i.e. which only perform simple calculations, so
most of the time is spent performing I/O), won't see much of an
improvement.

This is mainly due to the fact that the libgis raster I/O functions
aren't re-entrant, due to the use of pre-allocated row buffers (and
possibly other issues).

[There are also static row buffers in r.mapcalc itself, but those
could easily be eliminated if there was any point.]

Finally, eval() is excluded from parallelisation, as it is used for
variable assignments, and a variable needs to be assigned before it is
used. If this is significant, we can have separate parallel and
sequential versions. If you perform variable assignments other than in
eval(), you lose.

Mostly, I'm interested in discovering whether this actually has any
practical benefit. IOW, whether anyone is actually using scripts which
do enough computation to make use of additional cores.

This type of built-in parallelisation would be nice for CPU-bound operations
such as r.sun-- I often split my script across multiple cores via several
background jobs. The proposed approach would be much cleaner.

Nice work!

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On Sat, Nov 22, 2008 at 1:34 AM, Glynn Clements
<glynn@gclements.plus.com> wrote:

I have added experimental multi-threading support to r.mapcalc in 7.0
(r34440). It isn't enabled by default; to enable it, use
"make USE_PTHREAD=1". It only uses a handful of pthread functions, so
there's a reasonable chance of it working on MacOSX and/or Windows
(with the pthread-win32 library).

You can change the number of worker threads by setting the WORKERS
environment variable.

A curiosity: is the approach completely different from the parallelization
implemented in r.li.*?

The parallelism is limited to evaluation of argument expressions to
functions and operators.

This is not entirely clear to me:
"evaluation of argument expressions" - isn't this done only one time
when invoking r.mapcalc with a formula? What would be a complex
example which illustrates the benefits of parallelism in the evaluation
part?

It doesn't attempt to calculate multiple rows
concurrently, or multiple output maps.

Could this be integrated somehow from r.li.*?

Also, only one thread can call get_map_row() at a time, so commands
which are I/O bound (i.e. which only perform simple calculations, so
most of the time is spent performing I/O), won't see much of an
improvement.

This is mainly due to the fact that the libgis raster I/O functions
aren't re-entrant, due to the use of pre-allocated row buffers (and
possibly other issues).

[There are also static row buffers in r.mapcalc itself, but those
could easily be eliminated if there was any point.]

Long ago (KR to ANSI C reformatting) I suggested to split the
raster functions out of libgis, calling them Rast_XXX().
Do you think this could become relevant now for GRASS 7?

I still find libgis overwhelmed with too many different functions.

Finally, eval() is excluded from parallelisation, as it is used for
variable assignments, and a variable needs to be assigned before it is
used. If this is significant, we can have separate parallel and
sequential versions. If you perform variable assignments other than in
eval(), you lose.

Mostly, I'm interested in discovering whether this actually has any
practical benefit. IOW, whether anyone is actually using scripts which
do enough computation to make use of additional cores.

Do we have an example in [6.4] scripts/ which I can use for timing testing?

Markus

Markus Neteler wrote:

> I have added experimental multi-threading support to r.mapcalc in 7.0
> (r34440). It isn't enabled by default; to enable it, use
> "make USE_PTHREAD=1". It only uses a handful of pthread functions, so
> there's a reasonable chance of it working on MacOSX and/or Windows
> (with the pthread-win32 library).
>
> You can change the number of worker threads by setting the WORKERS
> environment variable.

A curiosity: is the approach completely different from the parallelization
implemented in r.li.*?

AFAICT, r.li uses a daemon and separate processes.

> The parallelism is limited to evaluation of argument expressions to
> functions and operators.

This is not entirely clear to me:
"evaluation of argument expressions" - isn't this done only one time
when invoking r.mapcalc with a formula?

Evaluation is what r.mapcalc is doing all the time that it's running.
The top-level expressions are evaluated for each map row. Evaluating a
function call (which includes infix operators) requires first
evaluating each argument expression to an argument value (i.e. a row
of data), then calling the function on the argument values.

What would be a complex
example which illustrates the benefits of parallelism in the evaluation
part?

Consider e.g. "out = a * b + c * d".

This parses as "add(mul(a,b),mul(c,d))".

Evaluation would amount to:

  tmp1 = a * b
  tmp2 = c * d
  tmp3 = tmp1 + tmp2
  write_row(tmp3)

The parallelism causes the first two steps above to be executed
concurrently in separate threads.

The original evaluate_function() code was:

    for (i = 1; i <= e->data.func.argc; i++)
  evaluate(e->data.func.args[i]);

    res = (*e->data.func.func) (e->data.func.argc,
        e->data.func.argt, e->data.func.argv);

The new version is:

    if (e->data.func.argc > 1 && e->data.func.func != f_eval) {
  for (i = 1; i <= e->data.func.argc; i++)
      begin_evaluate(e->data.func.args[i]);

  for (i = 1; i <= e->data.func.argc; i++)
      end_evaluate(e->data.func.args[i]);
    }
    else
  for (i = 1; i <= e->data.func.argc; i++)
      evaluate(e->data.func.args[i]);

    res = (*e->data.func.func) (e->data.func.argc,
        e->data.func.argt, e->data.func.argv);

IOW, rather than fully evaluating the argument expressions
sequentially, it fist commences evaluation for all of them, then waits
for all of them to finish.

The original behaviour is retained for the eval() function (which is
normally used as a sequencing operator, which will break if
expressions are evaluated concurrently) and functions with a single
argument (where there's no benefit to parallelism, only overhead).

> It doesn't attempt to calculate multiple rows
> concurrently, or multiple output maps.

Could this be integrated somehow from r.li.*?

No.

The main problem is trying to retrofit parallelism onto code
(r.mapcalc scripts) which isn't expecting it.

The problem with evaluating multiple output maps concurrently is
related primarly to the handling of variables. E.g. consider:

  r.mapcalc <<EOF
  out1 = eval(a=1,...)
  out2 = eval(a=2,...)
  out3 = a
  EOF

If you try to evaluate the maps in parallel, you'll get problems with
the out1 and out2 evaluations trying to store different values in "a".
Also, out3 should be 2, but that would need steps to make it happen.

IOW, you would have to ensure that anything which reads a variable
waits until its value for that row has been calculated.

The main problem with evaluating multiple rows concurrently is the way
that buffers are managed. Each node in the parse tree has an
associated buffer which contains the value for the current row. This
would need to be change to allow either N buffers for N-row
parallelism or separating the buffers from the expression nodes. The
latter approach is complicated by the implementation of variables
(which simply use the RHS expression's buffer as their own buffer
rather than allocating a separate buffer).

> Also, only one thread can call get_map_row() at a time, so commands
> which are I/O bound (i.e. which only perform simple calculations, so
> most of the time is spent performing I/O), won't see much of an
> improvement.
>
> This is mainly due to the fact that the libgis raster I/O functions
> aren't re-entrant, due to the use of pre-allocated row buffers (and
> possibly other issues).
>
> [There are also static row buffers in r.mapcalc itself, but those
> could easily be eliminated if there was any point.]

Long ago (KR to ANSI C reformatting) I suggested to split the
raster functions out of libgis, calling them Rast_XXX().
Do you think this could become relevant now for GRASS 7?

I still find libgis overwhelmed with too many different functions.

I'm inclined to agree. However, it can sometimes be hard to separate
"raster" functions from "general" functions due to GRASS' origin as a
purely raster GIS. E.g. the region resolution is specific to rasters,
but region handling geneerally isn't. Are the functions which deal
with colour tables "raster" functions. Etc.

Other cases are easier, e.g. memory allocation and error handling.

> Finally, eval() is excluded from parallelisation, as it is used for
> variable assignments, and a variable needs to be assigned before it is
> used. If this is significant, we can have separate parallel and
> sequential versions. If you perform variable assignments other than in
> eval(), you lose.
>
> Mostly, I'm interested in discovering whether this actually has any
> practical benefit. IOW, whether anyone is actually using scripts which
> do enough computation to make use of additional cores.

Do we have an example in [6.4] scripts/ which I can use for timing testing?

r.shaded.relief seems to have the most complex r.mapcalc expression of
any of the supplied scripts.

--
Glynn Clements <glynn@gclements.plus.com>

On Fri, Jan 9, 2009 at 5:51 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

> I have added experimental multi-threading support to r.mapcalc in 7.0
> (r34440). It isn't enabled by default; to enable it, use
> "make USE_PTHREAD=1". It only uses a handful of pthread functions, so
> there's a reasonable chance of it working on MacOSX and/or Windows
> (with the pthread-win32 library).
>
> You can change the number of worker threads by setting the WORKERS
> environment variable.

...

r.shaded.relief seems to have the most complex r.mapcalc expression of
any of the supplied scripts.

Currently r.shaded.relief fails in GRASS 7 (I wanted to compare
timings to GRASS 6):

GRASS 7.0.svn (nc_spm_07):~/grass70 > r.shaded.relief elevation
Calculating shading, please stand by.
syntax error, unexpected ')'
ERROR: In calculation, script aborted.

Markus

Markus Neteler wrote:

>> > I have added experimental multi-threading support to r.mapcalc in 7.0
>> > (r34440). It isn't enabled by default; to enable it, use
>> > "make USE_PTHREAD=1". It only uses a handful of pthread functions, so
>> > there's a reasonable chance of it working on MacOSX and/or Windows
>> > (with the pthread-win32 library).
>> >
>> > You can change the number of worker threads by setting the WORKERS
>> > environment variable.
...
> r.shaded.relief seems to have the most complex r.mapcalc expression of
> any of the supplied scripts.

Currently r.shaded.relief fails in GRASS 7 (I wanted to compare
timings to GRASS 6):

GRASS 7.0.svn (nc_spm_07):~/grass70 > r.shaded.relief elevation
Calculating shading, please stand by.
syntax error, unexpected ')'
ERROR: In calculation, script aborted.

Fixed in r35814.

--
Glynn Clements <glynn@gclements.plus.com>

On Fri, Jan 9, 2009 at 5:51 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

> I have added experimental multi-threading support to r.mapcalc in 7.0
> (r34440). It isn't enabled by default; to enable it, use
> "make USE_PTHREAD=1". It only uses a handful of pthread functions, so
> there's a reasonable chance of it working on MacOSX and/or Windows
> (with the pthread-win32 library).
>
> You can change the number of worker threads by setting the WORKERS
> environment variable.

...

r.shaded.relief seems to have the most complex r.mapcalc expression of
any of the supplied scripts.

I have built r.mapcalc with 'make USE_PTHREAD=1' in 7,
this are the results:

GRASS 7.0.svn (nc_spm_07):~ > g.region rast=elevation -p
...
cells: 2025000

GRASS 7.0.svn (nc_spm_07):~ > time r.shaded.relief elevation
Calculating shading, please stand by.
100%
Color table for raster map <elevation.shade> set to 'grey'
Shaded relief map created and named <elevation.shade>. Consider renaming.
9.22user 1.54system 0:09.63elapsed 111%CPU (0avgtext+0avgdata 0maxresident)k
56inputs+29960outputs (0major+53040minor)pagefaults 0swaps

GRASS 6.5.svn (nc_spm_07):~ > g.region rast=elevation
GRASS 6.5.svn (nc_spm_07):~ > time r.shaded.relief elevation
Calculating shading, please stand by.
100%
Color table for raster map <elevation.shade> set to 'grey'
Shaded relief map created and named <elevation.shade>. Consider renaming.
10.68user 0.40system 0:11.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
8inputs+29968outputs (0major+48438minor)pagefaults 0swaps

Markus

Glynn wrote:

I have added experimental multi-threading support to r.mapcalc in 7.0

Markus:

> r.shaded.relief seems to have the most complex r.mapcalc expression of
> any of the supplied scripts.

I have built r.mapcalc with 'make USE_PTHREAD=1' in 7,
this are the results:

....

GRASS 7.0.svn (nc_spm_07):~ > time r.shaded.relief elevation

....

9.22 user; 1.54 system; 0:09.63 elapsed; 111%CPU

....

GRASS 6.5.svn (nc_spm_07):~ > time r.shaded.relief elevation

....

10.68 user; 0.40 system; 0:11.13 elapsed; 99%CPU

not too much of a speedup. I wonder how it looks if you try with a more
"difficult" res=1 ? I guess it is still more of an I/O disk bound
bottleneck than a computational one. what's a nice expensive r.mapcalc
expression to try? pow()? atan2()? graph()? mode()?

Hamish