[GRASS-dev] Multi-threading and use of multiple processors

A colleague of mine would like to run a cost analysis (with either
r.walk or r.cost) on a very very large raster, ~ 1500 km x 800 km with
about 30 m resolution equals about 40 billion cells. This is obviously
going to take a ridiculously long time, and he would like to make use
of his dual quad-core processors to speed up this process (he has
decided that a week of running time is acceptable).

Is there any plan to add multi-threading to these or other modules (I
think I remember something about r.mapcalc and another a while back)?
What would it take to do this? Is there anything else anyone can
recommend to help speed up this analysis (I think I remember something
about keeping the whole thing in memory as well...)?

Thanks,
-Colin

Colin Nielsen wrote:

A colleague of mine would like to run a cost analysis (with either
r.walk or r.cost) on a very very large raster, ~ 1500 km x 800 km with
about 30 m resolution equals about 40 billion cells. This is obviously
going to take a ridiculously long time, and he would like to make use
of his dual quad-core processors to speed up this process (he has
decided that a week of running time is acceptable).

Is there any plan to add multi-threading to these or other modules (I
think I remember something about r.mapcalc and another a while back)?
What would it take to do this? Is there anything else anyone can
recommend to help speed up this analysis (I think I remember something
about keeping the whole thing in memory as well...)?

Unfortunately, adding parallelism to these modules wouldn't be
straightfoward, as you can't easily divide the data up into isolated
portions.

The simplest way to speed them up would be to replace the use of the
segment library with a more efficient version, e.g. that used in
r.proj.seg (r.proj in 7.0).

--
Glynn Clements <glynn@gclements.plus.com>

On Wed, Apr 8, 2009 at 3:44 AM, Colin Nielsen <colin.nielsen@gmail.com> wrote:

A colleague of mine would like to run a cost analysis (with either
r.walk or r.cost) on a very very large raster, ~ 1500 km x 800 km with
about 30 m resolution equals about 40 billion cells.

[ see my other today's email about r.terracost now in Addons ]

...

Is there any plan to add multi-threading to these or other modules (I
think I remember something about r.mapcalc and another a while back)?

Yann Chemin parallelized i.atcorr for me with openMP which required
the addition of a few lines and a gcc >= 4.2 compiler. For hints, see

http://grass.osgeo.org/wiki/OpenMP

Perhaps this can be introduced at library level at some point. The
behavior or a pixel oriented module such as i.atcorr is of course
different from that of r.cost which has to consider more than a pixel
in a cycle.

Markus

In my experience, the main drain in computation time is the IO not the
actual calculation of the btree. Is there any way to parallelize the
IO by passing rows to different cores or something like that ie. "for
(row = 0; row < nrows; row++) {"?

Or a more efficient segment library sounds good as well. Is there a
way to estimate how much more efficient would it be and would it make
use of multiple cores?

-Colin

On Wed, Apr 8, 2009 at 8:17 AM, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Apr 8, 2009 at 3:44 AM, Colin Nielsen <colin.nielsen@gmail.com> wrote:

A colleague of mine would like to run a cost analysis (with either
r.walk or r.cost) on a very very large raster, ~ 1500 km x 800 km with
about 30 m resolution equals about 40 billion cells.

[ see my other today's email about r.terracost now in Addons ]

...

Is there any plan to add multi-threading to these or other modules (I
think I remember something about r.mapcalc and another a while back)?

Yann Chemin parallelized i.atcorr for me with openMP which required
the addition of a few lines and a gcc >= 4.2 compiler. For hints, see

http://grass.osgeo.org/wiki/OpenMP

Perhaps this can be introduced at library level at some point. The
behavior or a pixel oriented module such as i.atcorr is of course
different from that of r.cost which has to consider more than a pixel
in a cycle.

Markus

Colin Nielsen wrote:

In my experience, the main drain in computation time is the IO not the
actual calculation of the btree. Is there any way to parallelize the
IO by passing rows to different cores or something like that ie. "for
(row = 0; row < nrows; row++) {"?

That's feasible, but r.cost and r.walk aren't row-by-row algorithms.

With a row-by-row algorithm, you still have the issue that you can't
have multiple concurrent invocations of G_get_raster_row() etc on a
single map (in 6.x, you simply can't have multiple concurrent
invocations, even on different maps).

But you can have one thread which reads rows, multiple threads which
each process a particular row, and one thread which writes rows. The
number of useful intermediate threads is limited by the I/O rate.

Or a more efficient segment library sounds good as well. Is there a
way to estimate how much more efficient would it be

Hard to say. The main inefficiencies with the current implementation
are:

1. The tile size isn't restricted to a power of two (so it uses
division/remainder rather than shift/mask).

2. The tile size is set at run-time rather than compile-time.

3. Acess is via function calls rather than macros or inline functions.

The version in r.proj[.seg] doesn't have these issues, although it's
currently read-only.

and would it make use of multiple cores?

Using multiple threads for segment I/O would probably be a net loss,
as you would need to perform locking for every access.

Multi-threading is most effective if you can partition the data such
that you don't need to perform locking.

--
Glynn Clements <glynn@gclements.plus.com>

I sent a previous mail but it did not go through, so sending it again:

The base concept is that if we are going to parallelize, then the parallelization can be of two types,
keeping the current algorithm and tweaking it to become more parallel.
Here, a method will be to run the r.cost of two entities in the store(heap I assume), to a certain depth, and then collecting their values.

Another method is to partition the dataspace into large chunks, and perform operations on them.
Here the method is to divide the space into say 8 blocks, such that each block divides the map into 3 entities and any path from block 1 and 3 pass through block2. Then run the algorithm for each boundary element such that the cost of each pixel pair on (boundary12 and boundary23) and (boundary12,12) and (boundary 23,23) cost is known.

The given map is just too big for the second operation, the size of temporary store being arnd 40 gb.
For smaller image sizes, the method would have worked just fine.
The reference paper for the second method
A simple parallel algorithm for the single source shortest path problem on planar digraphs.

The assumption I am making is that the algorithm for r.cost is closely related to dijkstra’s algorithm.

On Fri, Apr 10, 2009 at 6:02 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

Colin Nielsen wrote:

In my experience, the main drain in computation time is the IO not the
actual calculation of the btree. Is there any way to parallelize the
IO by passing rows to different cores or something like that ie. “for
(row = 0; row < nrows; row++) {”?

That’s feasible, but r.cost and r.walk aren’t row-by-row algorithms.

With a row-by-row algorithm, you still have the issue that you can’t
have multiple concurrent invocations of G_get_raster_row() etc on a
single map (in 6.x, you simply can’t have multiple concurrent
invocations, even on different maps).

But you can have one thread which reads rows, multiple threads which
each process a particular row, and one thread which writes rows. The
number of useful intermediate threads is limited by the I/O rate.

Or a more efficient segment library sounds good as well. Is there a
way to estimate how much more efficient would it be

Hard to say. The main inefficiencies with the current implementation
are:

  1. The tile size isn’t restricted to a power of two (so it uses
    division/remainder rather than shift/mask).

  2. The tile size is set at run-time rather than compile-time.

  3. Acess is via function calls rather than macros or inline functions.

The version in r.proj[.seg] doesn’t have these issues, although it’s
currently read-only.

and would it make use of multiple cores?

Using multiple threads for segment I/O would probably be a net loss,
as you would need to perform locking for every access.

Multi-threading is most effective if you can partition the data such
that you don’t need to perform locking.


Glynn Clements <glynn@gclements.plus.com>


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

JYOTHISH SOMAN
MS-2008-CS
Center for Security, Theory And Algorithm Research (CSTAR)
International Institute of Information Technology
Hyderabad
India
Phone:+91-9966222626
http://www.iiit.ac.in/

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
George Bernard Shaw


JYOTHISH SOMAN
MS-2008-CS
Center for Security, Theory And Algorithm Research (CSTAR)
International Institute of Information Technology
Hyderabad
India
Phone:+91-9966222626
http://www.iiit.ac.in/

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
George Bernard Shaw

On Sat, Apr 11, 2009 at 10:23 PM, Jyothish Soman
<jyothish.soman@gmail.com> wrote:
...

The assumption I am making is that the algorithm for r.cost is closely
related to dijkstra's algorithm.

Just a note:
in GRASS-Addons is a new module (r.terracost which uses this approach):

http://trac.osgeo.org/grass/browser/grass-addons/raster/r.terracost
http://trac.osgeo.org/grass/browser/grass-addons/raster/r.terracost/description.html

Markus

Well actually, the paper is in GIS terms what has been given in my reference in graph theoretical terms.
Can somebody tell me how exactly are r.cost and r.terraflow different, in terms of cost of traversal between two pixels?

On Sun, Apr 12, 2009 at 2:10 AM, Markus Neteler <neteler@osgeo.org> wrote:

On Sat, Apr 11, 2009 at 10:23 PM, Jyothish Soman

<jyothish.soman@gmail.com> wrote:

The assumption I am making is that the algorithm for r.cost is closely
related to dijkstra’s algorithm.

Just a note:
in GRASS-Addons is a new module (r.terracost which uses this approach):

http://trac.osgeo.org/grass/browser/grass-addons/raster/r.terracost
http://trac.osgeo.org/grass/browser/grass-addons/raster/r.terracost/description.html

Markus


JYOTHISH SOMAN
MS-2008-CS
Center for Security, Theory And Algorithm Research (CSTAR)
International Institute of Information Technology
Hyderabad
India
Phone:+91-9966222626
http://www.iiit.ac.in/

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
George Bernard Shaw