There is an OpenMP version of r.clump available at
http://sil.uc.edu/downloads.html#software
called r.clump4p. The reported performance gain is 450 times over the
original r.clump.
The performance gain over the original r.clump vanished:
r.clump4p with one thread is now about 12x slower than r.clump, and
r.clump4p with 4 threads is now about 5x slower than r.clump.
Moreover, the results of r.clump4p are wrong, it clumps together areas
with different cell values (no multithreading effect).
I tested in a region with 650 million cells, both modules produced
about 17.6 million clumps. All differences in the resultant clumps
were due to errors in r.clump4p.
Tests were performed in GRASS 7 where I have optimized r.clump and
also added support for diagonal clump tracing.
Markus M
On 23/02/14 23:10, Markus Metz wrote:
There is an OpenMP version of r.clump available at
http://sil.uc.edu/downloads.html#software
called r.clump4p. The reported performance gain is 450 times over the
original r.clump.
The performance gain over the original r.clump vanished:
r.clump4p with one thread is now about 12x slower than r.clump, and
r.clump4p with 4 threads is now about 5x slower than r.clump.
Moreover, the results of r.clump4p are wrong, it clumps together areas
with different cell values (no multithreading effect).
I tested in a region with 650 million cells, both modules produced
about 17.6 million clumps. All differences in the resultant clumps
were due to errors in r.clump4p.
Tests were performed in GRASS 7 where I have optimized r.clump and
also added support for diagonal clump tracing.
Great work ! Thanks a lot, Markus !
Moritz
On Mon, Feb 24, 2014 at 12:00 PM, Moritz Lennert
<mlennert@club.worldonline.be> wrote:
On 23/02/14 23:10, Markus Metz wrote:
There is an OpenMP version of r.clump available at
http://sil.uc.edu/downloads.html#software
called r.clump4p. The reported performance gain is 450 times over the
original r.clump.
The performance gain over the original r.clump vanished:
r.clump4p with one thread is now about 12x slower than r.clump, and
r.clump4p with 4 threads is now about 5x slower than r.clump.
Moreover, the results of r.clump4p are wrong, it clumps together areas
with different cell values (no multithreading effect).
In my test area with 650 million cells and 17.6 million clumps, the
updated version of r.clump is now about 20x faster than r.clump4p (one
thread) and about 8x faster than r.clump4p (four threads). r.clump
always uses only one thread. In real time this is 3 minutes vs 1 hour.
I tested in a region with 650 million cells, both modules produced
about 17.6 million clumps. All differences in the resultant clumps
were due to errors in r.clump4p.
Tests were performed in GRASS 7 where I have optimized r.clump and
also added support for diagonal clump tracing.
Great work ! Thanks a lot, Markus !
Moritz