[GRASS-user] increase performance of r.neighbors

Hi all,
I’m running r.neighbors for a global 250m raster with window-size setting larger than 100 pixels.
The computation take almost 1 week and i’m thinking if there is a way to speed up the process.

I know how to set-up a multi-region & multi-core computation and working in tiles but I would avoid due to the difference that I would encounter in the tile borders (and tile overlap will be required).

Is it possible to run r.neighbors in parallel or increase the memory that r.neighbors would use (as developed in r.watershed)?

Thank you
Best
Giuseppe

···

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

Le Fri, 26 May 2017 11:04:14 -0400,
Giuseppe Amatulli <giuseppe.amatulli@gmail.com> a écrit :

Hi all,
I'm running r.neighbors for a global 250m raster with window-size
setting larger than 100 pixels.

That is a very, very large raster and a
very, very large window size. If you cover the whole of earth and
my calculation are correct, you should have over 8 billion pixels. And
you are asking the computer to compute for each of these pixels a
statistic based on over 10.000 pixel values (100x100).

Does it really make sense in your use case to calculate a stat based on
pixels some of which are 12.5km away from your pixel ?

The computation take almost 1 week and i'm thinking if there is a way
to speed up the process.

I know how to set-up a multi-region & multi-core computation and
working in tiles but I would avoid due to the difference that I would
encounter in the tile borders (and tile overlap will be required).

If you work with sufficient tile overlap, you won't have the border
effect, you can just drop the entire overlap area after the
calculations.

Is it possible to run r.neighbors in parallel or increase the memory
that r.neighbors would use (as developed in r.watershed)?

AFAIK, r.neighbors already uses all available memory, but I'm not sure
that memory is the bottleneck, here.

And internally, no parallelization is foreseen.

So for running it in parallel, tiling is the only option, AFAIK.

Moritz

Le Fri, 26 May 2017 18:20:42 +0200,
Moritz Lennert <mlennert@club.worldonline.be> a écrit :

Le Fri, 26 May 2017 11:04:14 -0400,
Giuseppe Amatulli <giuseppe.amatulli@gmail.com> a écrit :

[...]

That is a very, very large raster and a
very, very large window size. If you cover the whole of earth and
my calculation are correct, you should have over 8 billion pixels. And
you are asking the computer to compute for each of these pixels a
statistic based on over 10.000 pixel values (100x100).

P.S. This said: what are your working region settings (g.region -p) ?

Moritz

This has been discussed quite intensive earlier:
https://lists.osgeo.org/pipermail/grass-user/2013-July/068679.html

S
________________________________________
Von: grass-user [grass-user-bounces@lists.osgeo.org] im Auftrag von Moritz Lennert [mlennert@club.worldonline.be]
Gesendet: Freitag, 26. Mai 2017 18:27
An: Giuseppe Amatulli
Cc: GRASS user list
Betreff: Re: [GRASS-user] increase performance of r.neighbors

Le Fri, 26 May 2017 18:20:42 +0200,
Moritz Lennert <mlennert@club.worldonline.be> a écrit :

Le Fri, 26 May 2017 11:04:14 -0400,
Giuseppe Amatulli <giuseppe.amatulli@gmail.com> a écrit :

[...]

That is a very, very large raster and a
very, very large window size. If you cover the whole of earth and
my calculation are correct, you should have over 8 billion pixels. And
you are asking the computer to compute for each of these pixels a
statistic based on over 10.000 pixel values (100x100).

P.S. This said: what are your working region settings (g.region -p) ?

Moritz
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

On Fri, May 26, 2017 at 5:04 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi all,
I’m running r.neighbors for a global 250m raster with window-size setting larger than 100 pixels.
The computation take almost 1 week and i’m thinking if there is a way to speed up the process.

I know how to set-up a multi-region & multi-core computation and working in tiles but I would avoid due to the difference that I would encounter in the tile borders (and tile overlap will be required).

Tiling would speed up the process. What is the problem with overlapping tiles? You can set up overlapping tiles, cut the results such that they are not overlapping each other and patch them in the end.

Markus M

Is it possible to run r.neighbors in parallel or increase the memory that r.neighbors would use (as developed in r.watershed)?

Thank you
Best
Giuseppe


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

Thanks all for the inputs.
In the end I implemented a scripting procedure outside grass using pktools (pkfilter).
I first tiled the image with some overlap area, than run pkfilter -f stdev , than crop the overlap area and then merge back the full image.

Best Regards
Giuseppe

···

On 28 May 2017 at 17:21, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Fri, May 26, 2017 at 5:04 PM, Giuseppe Amatulli <giuseppe.amatulli@gmail.com> wrote:

Hi all,
I’m running r.neighbors for a global 250m raster with window-size setting larger than 100 pixels.
The computation take almost 1 week and i’m thinking if there is a way to speed up the process.

I know how to set-up a multi-region & multi-core computation and working in tiles but I would avoid due to the difference that I would encounter in the tile borders (and tile overlap will be required).

Tiling would speed up the process. What is the problem with overlapping tiles? You can set up overlapping tiles, cut the results such that they are not overlapping each other and patch them in the end.

Markus M

Is it possible to run r.neighbors in parallel or increase the memory that r.neighbors would use (as developed in r.watershed)?

Thank you
Best
Giuseppe


Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511
Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/


grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user

Giuseppe Amatulli, Ph.D.

Research scientist at
Yale School of Forestry & Environmental Studies
Yale Center for Research Computing
Center for Science and Social Science Information
New Haven, 06511

Teaching: http://spatial-ecology.org
Work: https://environment.yale.edu/profile/giuseppe-amatulli/

On 01/06/17 10:49, Giuseppe Amatulli wrote:

Thanks all for the inputs.
In the end I implemented a scripting procedure outside grass using
pktools (pkfilter).
I first tiled the image with some overlap area, than run pkfilter -f
stdev , than crop the overlap area and then merge back the full image.

It would be interesting to find out whether this is significantly faster than doing exactly the same within GRASS, and if yes, why. Have you tried ?

Moritz