[GRASS-user] Simultaneous r.horizon processes

Daniel wrote:

What do you mean, r.sun can do multithreading? I've heard that
r.sun uses multithreading in GRASS 7, but is that implemented
in GRASS 6? Or are you talking about "poor man's
multithreading," like on the GRASS wiki?

there is OpenCL GPU accel. support, but it has not yet been
merged into grass 7. (mea culpa)

for r.sun being run 365 (or whatever) times in a row the "poor
man's" method is fine, in fact the r3.in.xyz script in addons
is perhaps the most efficient multi-CPUing in grass to date.
(to my surprise)

I've just read through the r.horizon code in devbr6 and I don't
see anything which makes the module unable to be run multiple
times in the same mapset. (no external region setting, no
generically named temp files, no gratuitous use of grass library
global variables) ... are you running under NFS or similar as
addressed by Markus's script? aka maybe the trouble is rooted
elsewhere?

I did a little debugging today and think it's due to the large
size of my study area (~36 km², 0.5m resolution).

72000x72000, how much ram does r.horizon use?
maybe processes are being killed as you run out of RAM.
in that case set max num of parallel jobs so that it fits
into memory without going into swap space instead of num of CPUs.
and error handling in the script (test for '$? -ne 0') could
help try failing runs again.

If I spatially partition the area and then stitch everything
back together, I hope it works - tests on smaller regions have
worked correctly thus far but I'll need to wait a while to see
the real results.

for r.horizon the mountains in the distance can matter (that's
the whole point) so I'd be careful with cutting up the region.
temporarily lowering the region resolution during r.horizon
may be less-bad of a compromise.

FWIW I've tentatively given up on using r.horizon, see the "r.sun
commissioning trials" trac ticket and wiki page. since sun
placement changes each day, and for sub degree placement of the
sun you need so many horizon maps as to make the loading of them
all more expensive than just re-calculating it on-the-fly but
with the exact placement. (I generally try for slow exactness
instead of fast processing time though, YMMV)
but maybe I don't correctly understand what r.horizon is doing..

Hamish

Daniel,

Wow, I may have done the math wrong, but I think you would need ~10GB of RAM per r.horizon process to run your map without constraining the area. So, I would be inclined to agree with Hamish that you may be RAM constrained. I have had some success using g.region to "tile" my dataset into rectangles that are long east-west (sunrise-sunset) and short north-south (summer-winter), but I had to make sure I had at least 25% overlap to cover edge effects.

To clarify, I meant "poor man's multithreading", i.e. running one instance of r.horizon per CPU core. I used the multiprocessing library make sure I created only as many processes as there were cores. I am still learning python, but unlike bash, I don't think you can do that without the mp library.

I was wrong about the incorrect values from r.horizon. When I looked back at the output, the files were overwriting each other because they had the same name and I had several processes writing to the same mapset to a map of the same name.

Good luck!

Collin Bode
UC Berkeley

On Apr 18, 2012, at 12:31 PM, Hamish wrote:

Daniel wrote:

What do you mean, r.sun can do multithreading? I've heard that
r.sun uses multithreading in GRASS 7, but is that implemented
in GRASS 6? Or are you talking about "poor man's
multithreading," like on the GRASS wiki?

there is OpenCL GPU accel. support, but it has not yet been
merged into grass 7. (mea culpa)

for r.sun being run 365 (or whatever) times in a row the "poor
man's" method is fine, in fact the r3.in.xyz script in addons
is perhaps the most efficient multi-CPUing in grass to date.
(to my surprise)

I've just read through the r.horizon code in devbr6 and I don't
see anything which makes the module unable to be run multiple
times in the same mapset. (no external region setting, no
generically named temp files, no gratuitous use of grass library
global variables) ... are you running under NFS or similar as
addressed by Markus's script? aka maybe the trouble is rooted
elsewhere?

I did a little debugging today and think it's due to the large
size of my study area (~36 km², 0.5m resolution).

72000x72000, how much ram does r.horizon use?
maybe processes are being killed as you run out of RAM.
in that case set max num of parallel jobs so that it fits
into memory without going into swap space instead of num of CPUs.
and error handling in the script (test for '$? -ne 0') could
help try failing runs again.

If I spatially partition the area and then stitch everything
back together, I hope it works - tests on smaller regions have
worked correctly thus far but I'll need to wait a while to see
the real results.

for r.horizon the mountains in the distance can matter (that's
the whole point) so I'd be careful with cutting up the region.
temporarily lowering the region resolution during r.horizon
may be less-bad of a compromise.

FWIW I've tentatively given up on using r.horizon, see the "r.sun
commissioning trials" trac ticket and wiki page. since sun
placement changes each day, and for sub degree placement of the
sun you need so many horizon maps as to make the loading of them
all more expensive than just re-calculating it on-the-fly but
with the exact placement. (I generally try for slow exactness
instead of fast processing time though, YMMV)
but maybe I don't correctly understand what r.horizon is doing..

Hamish

Hamish, Markus,

I have compiled the OpenCL code and got it to work with Grass70svn on Ubuntu 11.10, but it is severely memory constrained. Your map has to fit in your video ram (1GB for me). You can't use memory partitioning unless you have already run r.horizon, unfortunately, and r.horizon was never ported to OpenCL. OpenCL is exciting, but for large datasets, it is not yet useful :-(.

Is there any optimization tricks that we could do with either r.horizon or its equivalent in r.sun? For example, distant mountains do not need to be in 0.5 meter resolution, as with Daniel's dataset, or 2 meters in mine. 10-30 meters is sufficient to provide shading 10km away. It would be orders of magnitude faster to specify a 'regional' map for large scale topographic shading which would then be overlaid with a smaller tile of high resolution elevation.

I really want to just use r.sun and never use r.horizon again, but unless I can get access to a cluster with 10GB ram per node, I can't. It just takes too long to process.

Collin

On Apr 18, 2012, at 12:31 PM, Hamish wrote:

Daniel wrote:

What do you mean, r.sun can do multithreading? I've heard that
r.sun uses multithreading in GRASS 7, but is that implemented
in GRASS 6? Or are you talking about "poor man's
multithreading," like on the GRASS wiki?

there is OpenCL GPU accel. support, but it has not yet been
merged into grass 7. (mea culpa)

for r.sun being run 365 (or whatever) times in a row the "poor
man's" method is fine, in fact the r3.in.xyz script in addons
is perhaps the most efficient multi-CPUing in grass to date.
(to my surprise)

I've just read through the r.horizon code in devbr6 and I don't
see anything which makes the module unable to be run multiple
times in the same mapset. (no external region setting, no
generically named temp files, no gratuitous use of grass library
global variables) ... are you running under NFS or similar as
addressed by Markus's script? aka maybe the trouble is rooted
elsewhere?

I did a little debugging today and think it's due to the large
size of my study area (~36 km², 0.5m resolution).

72000x72000, how much ram does r.horizon use?
maybe processes are being killed as you run out of RAM.
in that case set max num of parallel jobs so that it fits
into memory without going into swap space instead of num of CPUs.
and error handling in the script (test for '$? -ne 0') could
help try failing runs again.

If I spatially partition the area and then stitch everything
back together, I hope it works - tests on smaller regions have
worked correctly thus far but I'll need to wait a while to see
the real results.

for r.horizon the mountains in the distance can matter (that's
the whole point) so I'd be careful with cutting up the region.
temporarily lowering the region resolution during r.horizon
may be less-bad of a compromise.

FWIW I've tentatively given up on using r.horizon, see the "r.sun
commissioning trials" trac ticket and wiki page. since sun
placement changes each day, and for sub degree placement of the
sun you need so many horizon maps as to make the loading of them
all more expensive than just re-calculating it on-the-fly but
with the exact placement. (I generally try for slow exactness
instead of fast processing time though, YMMV)
but maybe I don't correctly understand what r.horizon is doing..

Hamish

Hi Collin, Collin,

Alright, that’s a lot of great input to respond to so get ready for it :wink:

there is OpenCL GPU accel. support, but it has not yet been
merged into grass 7. (mea culpa)

Okay, that’s great to know. Has the project made any progress since last year? Our software go-to guy, Johannes, had been working on it with Seth, who had done a lot of work on it during GSoC at some point, but the last I’d heard about it was that it still had some problems when run on a graphic card. I believe it wasn’t sure whether the problem was the old graphic card that was being used or something in the code itself. On an Intel processor it seemed to run okay, but neither of them wanted to consider the version final at the point that I was still informed.

for r.sun being run 365 (or whatever) times in a row the “poor
man’s” method is fine, in fact the r3.in.xyz script in addons
is perhaps the most efficient multi-CPUing in grass to date.
(to my surprise)

That’s reassuring - I know what I’ve been doing up till now isn’t very elegant, but at least I can implement it :wink:

I’ve just read through the r.horizon code in devbr6 and I don’t
see anything which makes the module unable to be run multiple
times in the same mapset. (no external region setting, no
generically named temp files, no gratuitous use of grass library
global variables) … are you running under NFS or similar as
addressed by Markus’s script? aka maybe the trouble is rooted
elsewhere?

No, the process was running locally on my machine, pretty straightforward. I really think the problem is with RAM; as you note below.

72000x72000, how much ram does r.horizon use?
maybe processes are being killed as you run out of RAM.
in that case set max num of parallel jobs so that it fits
into memory without going into swap space instead of num of CPUs.
and error handling in the script (test for ‘$? -ne 0’) could
help try failing runs again.

It definitely uses way too much RAM; Colin thinks that it’s ~10GB / r.horizon process, which means that one process would be too much for my little machine with 8GB. I like your idea of programming recursively for $? ne 0, that shouldn’t be too hard.

FWIW I’ve tentatively given up on using r.horizon, see the “r.sun
commissioning trials” trac ticket and wiki page. since sun
placement changes each day, and for sub degree placement of the
sun you need so many horizon maps as to make the loading of them
all more expensive than just re-calculating it on-the-fly but
with the exact placement. (I generally try for slow exactness
instead of fast processing time though, YMMV)
but maybe I don’t correctly understand what r.horizon is doing…

True, true. I guess for the time I’ll just stick with r.sun -s. It is a shame though, because I have to spatially partition the area. Collin has a good suggestion down below on something that would be great, but for me it sounds like a great idea that won’t be implemented soon enough to use in this project :wink:

Is there any optimization tricks that we could do with either r.horizon or its equivalent in r.sun? For example, distant mountains do not need to be in 0.5 meter resolution, as with Daniel’s dataset, or 2 meters in mine. 10-30 meters is sufficient to provide shading 10km away. It would be orders of magnitude faster to specify a ‘regional’ map for large scale topographic shading which would then be overlaid with a smaller tile of high resolution elevation.

The mentioned great suggestion - just reading that makes my mouth water :wink:

I really want to just use r.sun and never use r.horizon again, but unless I can get access to a cluster with 10GB ram per node, I can’t. It just takes too long to process.

That’s exactly my situation too.

Wow, I may have done the math wrong, but I think you would need ~10GB of RAM per r.horizon process to run your map without constraining the area. So, I would be inclined to agree with Hamish that you may be RAM constrained. I have had some success using g.region to “tile” my dataset into rectangles that are long east-west (sunrise-sunset) and short north-south (summer-winter), but I had to make sure I had at least 25% overlap to cover edge effects.

Yup, I think you’re right. I guess it’s back to g.region for the time being, then overlapping my tiles and taking the lower values.

In any case a big thank you to everybody for the contributions and ideas. Time to fire up my IDE :wink: Wish ya’ll a nice weekend…

Daniel

Hey there,

I’m back with some more multithreading problems.

Following the previous discussion I’ve switched from trying to do multithreading with r.horizon and have switched to doing my calculations directly with r.sun. Of course, I want to use all my cores for r.sun, though, so I’ve tried implementing versions of Collin’s code using the multithreading library. Just for testing, I implemented a spatial partitioner that uses g.region to split the map iteratively into different sections, calls r.sun for these smaller sections, and then waits for the processes to finish before continuing. It behaves very strangely, though, and at the end it produces empty maps. The values are all -nan. As I first spatially partition the map and THEN run through the year, I don’t think that the partitioning is the problem, but perhaps something with the multiprocessing library. My program uses objects and it seems the library doesn’t play nicely with objects and methods, etc. Does anyone have any experience with doing this kind of procedure with objects? I’d hate to have to reprogram everything and just use functional programming, especially because of all the other disadvantages that that would bring for maintenance stuff in the future.

Best,
Daniel

Daniel,

For this discussion, it probably isn't important, but just to clarify, the mp module does multiprocessing, not multithreading. I haven't worked with objets, so can't help you on that. I am using g.region with all of my r.sun runs without a problem. So that is probably not your issue. How about r.horizon output? Are you using "-s" flag without horizon maps or with horizon maps? If you are using them and using multiple mapsets, you either need to have the horizon maps in PERMANENT or copied individually to each of the mapsets.

Collin

On Apr 29, 2012, at 9:14 AM, Daniel Lee wrote:

Hey there,

I'm back with some more multithreading problems.

Following the previous discussion I've switched from trying to do multithreading with r.horizon and have switched to doing my calculations directly with r.sun. Of course, I want to use all my cores for r.sun, though, so I've tried implementing versions of Collin's code using the multithreading library. Just for testing, I implemented a spatial partitioner that uses g.region to split the map iteratively into different sections, calls r.sun for these smaller sections, and then waits for the processes to finish before continuing. It behaves very strangely, though, and at the end it produces empty maps. The values are all -nan. As I first spatially partition the map and THEN run through the year, I don't think that the partitioning is the problem, but perhaps something with the multiprocessing library. My program uses objects and it seems the library doesn't play nicely with objects and methods, etc. Does anyone have any experience with doing this kind of procedure with objects? I'd hate to have to reprogram everything and just use functional programming, especially because of all the other disadvantages that that would bring for maintenance stuff in the future.

Best,
Daniel