#3104: Parallelize tools like r.cost
-------------------------+-------------------------
Reporter: belg4mit | Owner: grass-dev@…
Type: enhancement | Status: new
Priority: normal | Milestone: 7.0.5
Component: Default | Version: unspecified
Keywords: | CPU: Unspecified
Platform: Unspecified |
-------------------------+-------------------------
Currently QGIS will peg one of my cores while running r.cost. This leaves
the system usable, but it also greatly slows down the execution time.
Traversal of the cost raster should be readily parallelizable, and could
divide the run time by the number of cores allocated.
Replying to [ticket:3104 belg4mit]:
> Currently QGIS will peg one of my cores while running r.cost. This
leaves the system usable, but it also greatly slows down the execution
time. Traversal of the cost raster should be readily parallelizable, and
could divide the run time by the number of cores allocated.
Can you provide hints about how r.cost could be parallelized? That would
help a lot.
Hmm, the algorithm used is different than I imagined. However it seems
like one possibility, although it would not necessarily scale easily
beyond two cores, would be to have a pair of threads iterating over the
raster cells: One beginning at the top left and the other working
backwards from the bottom right.
Replying to [comment:3 belg4mit]:
> Hmm, the algorithm used is different than I imagined. However it seems
like one possibility, although it would not necessarily scale easily
beyond two cores, would be to have a pair of threads iterating over the
raster cells: One beginning at the top left and the other working
backwards from the bottom right.
r.cost, like least cost search path methods in general, do not iterate
over items. Instead, r.cost starts with the start cells, calculates costs
to the neighbors and continues with the cell with the least cost,
calculating the costs to its neighbors, granted that the neighbors have
not yet been processed. It stops when all cells have been processed or if
all stop points have been reached. That means random access to cells, the
order in which cells are processed is determined by the cost to reach a
cell.
I understand your point, but that's still a form of iteration, simply over
a continuously updated list rather than pixel indices (which does occur as
well). Regardless, why couldn't the start cells be divided up among
multiple threads? e.g; those on the top of the image for core0, and those
on bottom for core1.
Replying to [comment:5 belg4mit]:
> I understand your point, but that's still a form of iteration, simply
over a continuously updated list rather than pixel indices (which does
occur as well). Regardless, why couldn't the start cells be divided up
among multiple threads? e.g; those on the top of the image for core0, and
those on bottom for core1.
The list needs to be updated using the cell with the current lowest cost.
There is always only one such cell, processing the e.g. four cells with
the lowest costs in separate threads would corrupt the result (after one
cell is processed, the list of the smallest four cells might change). In
other words, the order of iteration is not predictable.
You could start a separate thread for each start point separately, but
then you would need to create an individual temporary output for each
start cell and process all cells once for each start cell. At the end, the
different outputs would need to be merged, assigning to each cell the cost
and the path to the closest start cell. In the current single-thread
version, all start cells are loaded at the beginning and processed
together. Each cell needs to be processed only once, regardless of the
number of start cells. Thus with your suggestion, more cores could be
used, but total processing time would be longer.
Thanks for the references! There are more helpful references in the links
you provided. Parallelizing r.cost according to these references would be
a nice Google Summer of Code project.