I sent a previous mail but it did not go through, so sending it again:
The base concept is that if we are going to parallelize, then the parallelization can be of two types,
keeping the current algorithm and tweaking it to become more parallel.
Here, a method will be to run the r.cost of two entities in the store(heap I assume), to a certain depth, and then collecting their values.
Another method is to partition the dataspace into large chunks, and perform operations on them.
Here the method is to divide the space into say 8 blocks, such that each block divides the map into 3 entities and any path from block 1 and 3 pass through block2. Then run the algorithm for each boundary element such that the cost of each pixel pair on (boundary12 and boundary23) and (boundary12,12) and (boundary 23,23) cost is known.
The given map is just too big for the second operation, the size of temporary store being arnd 40 gb.
For smaller image sizes, the method would have worked just fine.
The reference paper for the second method
A simple parallel algorithm for the single source shortest path problem on planar digraphs.
The assumption I am making is that the algorithm for r.cost is closely related to dijkstra’s algorithm.
On Fri, Apr 10, 2009 at 6:02 PM, Glynn Clements <glynn@gclements.plus.com> wrote:
Colin Nielsen wrote:
In my experience, the main drain in computation time is the IO not the
actual calculation of the btree. Is there any way to parallelize the
IO by passing rows to different cores or something like that ie. “for
(row = 0; row < nrows; row++) {”?
That’s feasible, but r.cost and r.walk aren’t row-by-row algorithms.
With a row-by-row algorithm, you still have the issue that you can’t
have multiple concurrent invocations of G_get_raster_row() etc on a
single map (in 6.x, you simply can’t have multiple concurrent
invocations, even on different maps).
But you can have one thread which reads rows, multiple threads which
each process a particular row, and one thread which writes rows. The
number of useful intermediate threads is limited by the I/O rate.
Or a more efficient segment library sounds good as well. Is there a
way to estimate how much more efficient would it be
Hard to say. The main inefficiencies with the current implementation
are:
-
The tile size isn’t restricted to a power of two (so it uses
division/remainder rather than shift/mask).
-
The tile size is set at run-time rather than compile-time.
-
Acess is via function calls rather than macros or inline functions.
The version in r.proj[.seg] doesn’t have these issues, although it’s
currently read-only.
and would it make use of multiple cores?
Using multiple threads for segment I/O would probably be a net loss,
as you would need to perform locking for every access.
Multi-threading is most effective if you can partition the data such
that you don’t need to perform locking.
–
Glynn Clements <glynn@gclements.plus.com>
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev
–
JYOTHISH SOMAN
MS-2008-CS
Center for Security, Theory And Algorithm Research (CSTAR)
International Institute of Information Technology
Hyderabad
India
Phone:+91-9966222626
http://www.iiit.ac.in/
The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
George Bernard Shaw
–
JYOTHISH SOMAN
MS-2008-CS
Center for Security, Theory And Algorithm Research (CSTAR)
International Institute of Information Technology
Hyderabad
India
Phone:+91-9966222626
http://www.iiit.ac.in/
The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
George Bernard Shaw