[GRASS-user] improve v.rast.stats speed?

Hello list.
Yesterday I needed to use v.rast.stats on a 1793 areas covering a
4415x6632 raster (with resolution 50m/pixel). I've used it without
extended statistics but the processing time was, with an euphemism,
very very long. After 5 hours it wasn't finished yet. As I needed it
for today morning I've decided to reproduce it with ArcGIS: 40
seconds. I've tried to investigate what was going wrong, the
bottleneck, but at the end I suppose that it's a problem of the script
itself (the looping chain of r.mapcalc and r.univar, the creation and
deletion of the MASK in each loop).
Is there any way to improve the performance of v.rast.stats? Should we
rewrite it in C and avoid the use of MASKs?

I must say that this has been a shock for my collegues seeing this
impressive performance differences... And I would like to contribute
in healing them! :slight_smile:

Giovanni

I forgot an important question: does anyone have had better
experiences with it? Maybe it's only a problem with my computer or my
grass build...

2009/2/19 G. Allegri <giohappy@gmail.com>:

Hello list.
Yesterday I needed to use v.rast.stats on a 1793 areas covering a
4415x6632 raster (with resolution 50m/pixel). I've used it without
extended statistics but the processing time was, with an euphemism,
very very long. After 5 hours it wasn't finished yet. As I needed it
for today morning I've decided to reproduce it with ArcGIS: 40
seconds. I've tried to investigate what was going wrong, the
bottleneck, but at the end I suppose that it's a problem of the script
itself (the looping chain of r.mapcalc and r.univar, the creation and
deletion of the MASK in each loop).
Is there any way to improve the performance of v.rast.stats? Should we
rewrite it in C and avoid the use of MASKs?

I must say that this has been a shock for my collegues seeing this
impressive performance differences... And I would like to contribute
in healing them! :slight_smile:

Giovanni

Hello,
when I first implemented r.lake as a script, it took more than a day
to finish single run on powerfull P4 box. After I rewrote it in C, now
it takes seconds on my slow laptop. Calling r.mapcalc every time
involves opening files, reading raster and NULL values etc. If it can
be done only once, it will increase performance a lot.
I have not looked into that module - no idea how to write C version.
After reading development documentation, example modules and source,
if You still have some questions - ask in dev list.

Best wishes,
Maris.

2009/2/19, G. Allegri <giohappy@gmail.com>:

Hello list.
Yesterday I needed to use v.rast.stats on a 1793 areas covering a
4415x6632 raster (with resolution 50m/pixel). I've used it without
extended statistics but the processing time was, with an euphemism,
very very long. After 5 hours it wasn't finished yet. As I needed it
for today morning I've decided to reproduce it with ArcGIS: 40
seconds. I've tried to investigate what was going wrong, the
bottleneck, but at the end I suppose that it's a problem of the script
itself (the looping chain of r.mapcalc and r.univar, the creation and
deletion of the MASK in each loop).
Is there any way to improve the performance of v.rast.stats? Should we
rewrite it in C and avoid the use of MASKs?

I must say that this has been a shock for my collegues seeing this
impressive performance differences... And I would like to contribute
in healing them! :slight_smile:

Giovanni
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

On 19/02/09 13:09, G. Allegri wrote:

Is there any way to improve the performance of v.rast.stats? Should we
rewrite it in C and avoid the use of MASKs?

Yes ! Maybe the best solution would be to integrate Starspan as a GRASS module as has already been suggested several times [1].

In the meantime, you can transform your vector polygons into raster and use r.statistics.

Moritz

[1]http://lists.osgeo.org/pipermail/grass-user/2009-January/048609.html

Giovanni wrote:

Yesterday I needed to use v.rast.stats on a 1793 areas covering a
4415x6632 raster (with resolution 50m/pixel). I've used it without
extended statistics but the processing time was, with an euphemism,
very very long.

....

I've tried to investigate what was going wrong, the bottleneck, but
at the end I suppose that it's a problem of the script itself (the
looping chain of r.mapcalc and r.univar, the creation and deletion
of the MASK in each loop).

right, it's very inefficient.

FWIW, I had done something similar to v.rast.stats a while back. To
speed it up I used g.region to zoom in on the area of interest so the
r.mapcalc/r.univar step didn't have to run over the mostly NULL map*.
It helped a lot.

[*] IIUC if the entire row is NULL grass knows to skip the entire row
quickly and not test each cell, which makes processing maps with lots
of NULLs rather fast.

You can look at g.region.point and v.what.rast.buffer in the wiki addons
for some cleaned up version of my zoom-to-region-of-interest solution.

I was mostly interesting in calculating statistics for 100m buffers
around sampling sites, and extracting irregularly shaped vector blobs
is not as easy (v.extract + g.region zoom=??), e.g. if you have two
small vector blobs in opposite corners of the map with the same cat.

In a C version you might have access to the feature's bounding box,
which could be used to temporarily reset the region to speed up raster
processing. (??)

Markus Metz:

1) Use r.reclass instead of r.mapcalc to create new masks. That
should speed up at least the MASK creation and deletion

certainly worth a try, it is hugely less intensive for the MASK creation.

Giovanni:

Anyway as I can see Glynn has rewritten the same method (r.mapcalc and
r.univar). I confirm that

v.to.rast -> (r.mapcalc to multiply/round FCEDD/DCELL to CELL) ->
r.statistics -> (r.mapcalc to FCELL/DCELL again) -> r.to.vect -> join
original vector to the output one

is absolutely the faster way.

are the results identical?

Markus Metz:

if v.rast.stats is faster in grass7, then probably because of improved
raster libs. A speed increase from >5 hours to 40 seconds is unlikely
since grass.mapcalc is still called 1793 times (assuming each area has
a unique category) for a region with 4415x6632 cells...

If sticking with the MASK + r.univar method, the moving window region-
zoom/reset trick could help.

Giovanni:

The bottleneck is the r.univar limitation to CELL.

?did you mean r.statistics not r.univar?

Markus Metz:

r.univar2.zonal does zonal statistics

pssst- "svn copy" not "svn add". It works between -addons and trunk.

Hamish

Hamish wrote:

Markus Metz:
  

r.univar2.zonal does zonal statistics
    
pssst- "svn copy" not "svn add". It works between -addons and trunk.
  

Removed from addons again, I'll try again the proper svn way. Also have to sort out a random segfault first.

Markus M

Hamish wrote:

Markus Metz:
  

r.univar2.zonal does zonal statistics
    
pssst- "svn copy" not "svn add". It works between -addons and trunk.

Done, hopefully now the right way. Segfault fixed, only 2D zoning, no 3D zoning. Zone number and category label if present are printed before the stats are printed. -g and -e flags supported as before. Output can be written to a file for easier digesting.

Maybe someone has use for it. It's good to get stats e.g. per land-cover type, or also to do a manual v.rast.stats alternative.

Written against develbranch_6, so it should work with 6.4 and 6.5.

svn co https://svn.osgeo.org/grass/grass-addons/raster/r.univar2.zonal r.univar2.zonal

in a hurry,

Markus M