Hi Dylan.
I didn't let it finish because 15 minutes were too many for my task.
Ok, less then 5 hours and more of v.rast.stats, but too much respect
to ArcGIS and the rasterization solution in GRASS.
I've built the 1.2.03 version, downloaded from [1].
Anyway I suspect the same about GRASS driver inefficiencies in GDAL/OGR
OK. This is the old stable branch (I think). If you can get 2.0 to compile I
would suggest trying that. Starspan really needs to make it into OSGeo so
that more eyes can get in on the development + bug tracking. At one point it
was considerably faster than zonal stats in ArcGIS. I am planning on spending
more time on Starspan from May.
Cheers,
Dylan
2009/2/19 Dylan Beaudette <dylan.beaudette@gmail.com>:
> On Thu, Feb 19, 2009 at 5:20 AM, G. Allegri <giohappy@gmail.com> wrote:
>> Thanks for the ideas.
>> I've just tried Starspan but it's performance is still too slow. I've
>> let it run for 15 minutes...
>
> Hi,
>
> Did you ever let it finish? Can you post the version number? I have
> noticed that starspan tends to be slower when using GRASS vector and
> raster features-- probably a combination of inefficiencies in GDAL/OGR
> with the GRASS formats.
>
>
> Dylan
>
>> r.statistics is probably the best solution. I've investigated the
>> ArcGIS method and it actually seems to use a similar method
>> (ratserization of the features and various automations to join the
>> results). In fact they call the module "zonal statistics" that is
>> generally a set of raster basded methods.
>>
>> the only limitation of the actual r.statistics is that it works only
>> with CELL and not float. Ok, I can multiply my values and convert to
>> CELL, but we could try to let r.statistics deal with floats too...
>>
>> I will try to batch the process and let you know the results.
>>
>> 2009/2/19 Markus Metz <markus.metz.giswork@googlemail.com>:
>>> Markus Metz wrote:
>>>> G. Allegri wrote:
>>>>> Hello list.
>>>>> Yesterday I needed to use v.rast.stats on a 1793 areas covering a
>>>>> 4415x6632 raster (with resolution 50m/pixel). I've used it without
>>>>> extended statistics but the processing time was, with an euphemism,
>>>>> very very long. After 5 hours it wasn't finished yet. As I needed it
>>>>> for today morning I've decided to reproduce it with ArcGIS: 40
>>>>> seconds. I've tried to investigate what was going wrong, the
>>>>> bottleneck, but at the end I suppose that it's a problem of the
>>>>> script itself (the looping chain of r.mapcalc and r.univar, the
>>>>> creation and deletion of the MASK in each loop).
>>>>> Is there any way to improve the performance of v.rast.stats? Should
>>>>> we rewrite it in C and avoid the use of MASKs?
>>>>
>>>> I have two ideas.
>>>> 1) Use r.reclass instead of r.mapcalc to create new masks. That should
>>>> speed up at least the MASK creation and deletion
>>>> 2) Avoid the loop and MASK creation altogether. Run r.univar
>>>> map=tmpname,raster. Process the output of r.univar, separate stats for
>>>> the different vector areas and convert to sql statements. Proceed as
>>>> before. r.univar would be called only once. I'm not sure if this is
>>>> possible. I also don't know if the speed gain by avoiding the loop is
>>>> annihilated by r.univar having to process two rasters as input.
>>>
>>> Idea 2 is nonsense, I hoped for some behaviour like in r.statistics.
>>
>> _______________________________________________
>> grass-user mailing list
>> grass-user@lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/grass-user
My take on this is to rasterize my vector data with gdal_rasterize (you
can have a look at the rasterisation code and see how it works, in case
you need to eg buffer your vector data), load it up in python, load my
dataset in python, and calculate whatever stats with scipy+numpy. If
you look at this thread, you'll find it is very fast:
<http://article.gmane.org/gmane.comp.python.scientific.user/19412>\.
numpy is already requested by the new wxGUI*, so with numpy around anyway,
maybe some python module could be written for grass7, where python is a
full dependency?
My take on this is to rasterize my vector data with gdal_rasterize (you
can have a look at the rasterisation code and see how it works, in case
you need to eg buffer your vector data), load it up in python, load my
dataset in python, and calculate whatever stats with scipy+numpy. If you look at this thread, you'll find it is very fast: <http://article.gmane.org/gmane.comp.python.scientific.user/19412>\.
numpy is already requested by the new wxGUI*, so with numpy around anyway,
maybe some python module could be written for grass7, where python is a
full dependency?
* see gui/wxpython/gui_modules/profile.py
I think too that grass should provide a reasonably fast way to get this kind of stats. You can still devise your own solution if you want, but IMHO grass must be able to do this job reasonably fast and user-friendly.
Taking the risk of becoming annoying: with r.univar.zonal, everything could be done in one pass: rasterize vector, no need for mapcalc, run r.univar.zonal once (which itself needs only one pass), load stats to attribute table, done. With the example that started this thread, everything should be completed in very few minutes. Rasterizing the vector might take the longest.
Anyway, when it comes to processing time, I'm a speed junky, and >5 hours is simply unacceptable if it can also be done in minutes or even seconds, and grass should do that, not forcing users to come up with their own workarounds for something that grass is supposed to do.
On Sat, 2009-02-21 at 09:20 +0100, Markus Metz wrote:
Hamish wrote:
> Jose Gómez-Dans wrote:
>
>> My take on this is to rasterize my vector data with gdal_rasterize (you
>> can have a look at the rasterisation code and see how it works, in case
>> you need to eg buffer your vector data), load it up in python, load my
>> dataset in python, and calculate whatever stats with scipy+numpy. If
>> you look at this thread, you'll find it is very fast:
>> <http://article.gmane.org/gmane.comp.python.scientific.user/19412>\.
>>
>
> numpy is already requested by the new wxGUI*, so with numpy around anyway,
> maybe some python module could be written for grass7, where python is a
> full dependency?
>
> * see gui/wxpython/gui_modules/profile.py
>
>
I think too that grass should provide a reasonably fast way to get this
kind of stats. You can still devise your own solution if you want, but
IMHO grass must be able to do this job reasonably fast and user-friendly.
Taking the risk of becoming annoying: with r.univar.zonal, everything
could be done in one pass: rasterize vector, no need for mapcalc, run
r.univar.zonal once (which itself needs only one pass), load stats to
attribute table, done. With the example that started this thread,
everything should be completed in very few minutes. Rasterizing the
vector might take the longest.
Anyway, when it comes to processing time, I'm a speed junky, and >5
hours is simply unacceptable if it can also be done in minutes or even
seconds, and grass should do that, not forcing users to come up with
their own workarounds for something that grass is supposed to do.
Markus M
+1 (from an end-user :: I had to do "my" workaround once)
I'm out of office. When I'll be back I shall summerize the various
proposals and try to make some benchmark on my dataset. The first
thing is to make a cleaner distinction between the various stats
commands. Thanks for all the contributions!
On Sat, 2009-02-21 at 09:20 +0100, Markus Metz wrote:
Hamish wrote:
> Jose Gómez-Dans wrote:
>
>> My take on this is to rasterize my vector data with gdal_rasterize (you
>> can have a look at the rasterisation code and see how it works, in case
>> you need to eg buffer your vector data), load it up in python, load my
>> dataset in python, and calculate whatever stats with scipy+numpy. If
>> you look at this thread, you'll find it is very fast:
>> <http://article.gmane.org/gmane.comp.python.scientific.user/19412>\.
>>
>
> numpy is already requested by the new wxGUI*, so with numpy around
> anyway,
> maybe some python module could be written for grass7, where python is a
> full dependency?
>
> * see gui/wxpython/gui_modules/profile.py
>
>
I think too that grass should provide a reasonably fast way to get this
kind of stats. You can still devise your own solution if you want, but
IMHO grass must be able to do this job reasonably fast and user-friendly.
Taking the risk of becoming annoying: with r.univar.zonal, everything
could be done in one pass: rasterize vector, no need for mapcalc, run
r.univar.zonal once (which itself needs only one pass), load stats to
attribute table, done. With the example that started this thread,
everything should be completed in very few minutes. Rasterizing the
vector might take the longest.
Anyway, when it comes to processing time, I'm a speed junky, and >5
hours is simply unacceptable if it can also be done in minutes or even
seconds, and grass should do that, not forcing users to come up with
their own workarounds for something that grass is supposed to do.
Markus M
+1 (from an end-user :: I had to do "my" workaround once)
On Sat, 2009-02-21 at 23:53 +0100, Markus Neteler wrote:
On Sat, Feb 21, 2009 at 11:19 PM, Nikos Alexandris
<nikos.alexandris@felis.uni-freiburg.de> wrote:
> Dylan:
>> OK. This is the old stable branch (I think). If you can get 2.0 to
>> compile I would suggest trying that.
>
> Dylan, which one is 2.0 for linux? Can't trace it.