Hi,
I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).
I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).
- I tried to tile the process, but it doesn't seem to help much.
- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.
- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?
Cheers,
Pierre
Have you tried r.mapcalc using "nested if statements"?
Sent from my iPhone
On Apr 20, 2017, at 6:50 PM, Pierre Roudier <pierre.roudier@gmail.com> wrote:
Hi,
I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).
I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).
- I tried to tile the process, but it doesn't seem to help much.
- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.
- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?
Cheers,
Pierre
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user
Hi Pierre,
tiling should speed up significantly, if you process the tiles in parallel (and if you have multiple cores and if IO is not the bottleneck (e.g. slow network connection to the data)).
Care has to be taken with the region settings, though.
See e.g.:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles
Cheers
Stefan
________________________________________
Von: grass-user <grass-user-bounces@lists.osgeo.org> im Auftrag von Pierre Roudier <pierre.roudier@gmail.com>
Gesendet: Freitag, 21. April 2017 00:49
An: grass-user
Betreff: [GRASS-user] Aggregation of massive number of raster layers with r.series
Hi,
I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).
I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).
- I tried to tile the process, but it doesn't seem to help much.
- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.
- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?
Cheers,
Pierre
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user
Thanks all,
I ended up having a script that tiles my overall region (using v.mkgrid). I then loop through the tiles, and create a set of subregions on the fly (using the save= option available for g.region). So in the end I have tiles represeneted as a set of regions, named “region_[1-n]”.
I then use the WIND_OVERRIDE env variable to process the tiles:
- On my personal machine, I can use GNU parallel:
g.list type=region pat=region_* | parallel WIND_OVERRIDE={} r.series in=g.list rast pat=temp_* sep=","
out=tiled_{} method=quantile quantile=0.95 --o
- BUT: on the cluster, I can’t use GNU parallel, so I generate one script per region, which essentially is a one liner:
WIND_OVERRIDE=region_n r.series in=g.list rast pat=temp_* sep=","
out=tiled_region_n method=quantile quantile=0.95 --o
This script is launch silently using GRASS_BATCH_JOB.
My problem now is that I got errors because several GRASS scripts are hitting the GRASS database at the same time:
Starting GRASS GIS...
ERROR: pierre.roudier is currently running GRASS in selected mapset (file */projects/nesi00165/nobackup/modis/grassdata/modis_ts/PERMANENT/PERMANENT/*.gislock found). Concurrent use not allowed.
You can force launching GRASS using -f flag (note that you need permission for this operation). Have another look in the processor manager just to be sure...
Exiting...
My question: in this instance, is it safe to use the -f flag, given these different GRASS instances are not writing the same dataset to the DB?
···
On 21 April 2017 at 20:44, Blumentrath, Stefan <Stefan.Blumentrath@nina.no> wrote:
Hi Pierre,
tiling should speed up significantly, if you process the tiles in parallel (and if you have multiple cores and if IO is not the bottleneck (e.g. slow network connection to the data)).
Care has to be taken with the region settings, though.
See e.g.:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles
Cheers
Stefan
Von: grass-user <grass-user-bounces@lists.osgeo.org> im Auftrag von Pierre Roudier <pierre.roudier@gmail.com>
Gesendet: Freitag, 21. April 2017 00:49
An: grass-user
Betreff: [GRASS-user] Aggregation of massive number of raster layers with r.series
Hi,
I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).
I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).
-
I tried to tile the process, but it doesn’t seem to help much.
-
Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.
-
Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?
Cheers,
Pierre
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user
Le 11 mai 2017 23:30:28 GMT+02:00, Pierre Roudier <pierre.roudier@gmail.com> a écrit :
Thanks all,
I ended up having a script that tiles my overall region (using
v.mkgrid). I
then loop through the tiles, and create a set of subregions on the fly
(using the save= option available for g.region). So in the end I have
tiles
represeneted as a set of regions, named "region_[1-n]".
I then use the WIND_OVERRIDE env variable to process the tiles:
- On my personal machine, I can use GNU parallel:
g.list type=region pat=region_* | parallel WIND_OVERRIDE={} r.series
in=`g.list rast pat=temp_* sep=","` out=tiled_{} method=quantile
quantile=0.95 --o
- BUT: on the cluster, I can't use GNU parallel, so I generate one
script
per region, which essentially is a one liner:
WIND_OVERRIDE=region_n r.series in=`g.list rast pat=temp_* sep=","`
out=tiled_region_n method=quantile quantile=0.95 --o
This script is launch silently using GRASS_BATCH_JOB.
My problem now is that I got errors because several GRASS scripts are
hitting the GRASS database at the same time:
Starting GRASS GIS...
ERROR: pierre.roudier is currently running GRASS in selected mapset
(file
*/projects/nesi00165/nobackup/modis/grassdata/modis_ts/PERMANENT/PERMANENT/*.gislock
found). Concurrent use not allowed.
You can force launching GRASS using -f flag (note that you need
permission for this operation). Have another look in the processor
manager just to be sure...
Exiting...
My question: in this instance, is it safe to use the -f flag, given
these
different GRASS instances are not writing the same dataset to the DB?
I would say that the generally recommended way would be to create separate mapsets to avoid such conflicts. At the end you can loop over all mapsets to copy the results into one final mapset.
Moritz
On 21 April 2017 at 20:44, Blumentrath, Stefan
<Stefan.Blumentrath@nina.no>
wrote:
Hi Pierre,
tiling should speed up significantly, if you process the tiles in
parallel
(and if you have multiple cores and if IO is not the bottleneck (e.g.
slow
network connection to the data)).
Care has to be taken with the region settings, though.
See e.g.:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles
Cheers
Stefan
________________________________________
Von: grass-user <grass-user-bounces@lists.osgeo.org> im Auftrag von
Pierre Roudier <pierre.roudier@gmail.com>
Gesendet: Freitag, 21. April 2017 00:49
An: grass-user
Betreff: [GRASS-user] Aggregation of massive number of raster layers
with
r.series
Hi,
I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).
I am doing the aggregation using r.series on our cluster running
grass
7.2, but of course it takes ages (21% there after 3 days).
- I tried to tile the process, but it doesn't seem to help much.
- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.
- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?
Cheers,
Pierre
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user
Thanks Moritz,
Indeed, I ended up creating mapsets on the fly using grass72 -c, and processing tiles in their respective mapsets.
···
On 12 May 2017 at 18:18, Moritz Lennert <mlennert@club.worldonline.be> wrote:
Le 11 mai 2017 23:30:28 GMT+02:00, Pierre Roudier <pierre.roudier@gmail.com> a écrit :
Thanks all,
I ended up having a script that tiles my overall region (using
v.mkgrid). I
then loop through the tiles, and create a set of subregions on the fly
(using the save= option available for g.region). So in the end I have
tiles
represeneted as a set of regions, named “region_[1-n]”.
I then use the WIND_OVERRIDE env variable to process the tiles:
- On my personal machine, I can use GNU parallel:
g.list type=region pat=region_* | parallel WIND_OVERRIDE={} r.series
in=g.list rast pat=temp_* sep=","
out=tiled_{} method=quantile
quantile=0.95 --o
- BUT: on the cluster, I can’t use GNU parallel, so I generate one
script
per region, which essentially is a one liner:
WIND_OVERRIDE=region_n r.series in=g.list rast pat=temp_* sep=","
out=tiled_region_n method=quantile quantile=0.95 --o
This script is launch silently using GRASS_BATCH_JOB.
My problem now is that I got errors because several GRASS scripts are
hitting the GRASS database at the same time:
Starting GRASS GIS…
ERROR: pierre.roudier is currently running GRASS in selected mapset
(file
/projects/nesi00165/nobackup/modis/grassdata/modis_ts/PERMANENT/PERMANENT/.gislock
found). Concurrent use not allowed.
You can force launching GRASS using -f flag (note that you need
permission for this operation). Have another look in the processor
manager just to be sure…
Exiting…
My question: in this instance, is it safe to use the -f flag, given
these
different GRASS instances are not writing the same dataset to the DB?
I would say that the generally recommended way would be to create separate mapsets to avoid such conflicts. At the end you can loop over all mapsets to copy the results into one final mapset.
Moritz
On 21 April 2017 at 20:44, Blumentrath, Stefan
<Stefan.Blumentrath@nina.no>
wrote:
Hi Pierre,
tiling should speed up significantly, if you process the tiles in
parallel
(and if you have multiple cores and if IO is not the bottleneck (e.g.
slow
network connection to the data)).
Care has to be taken with the region settings, though.
See e.g.:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles
Cheers
Stefan
Von: grass-user <grass-user-bounces@lists.osgeo.org> im Auftrag von
Pierre Roudier <pierre.roudier@gmail.com>
Gesendet: Freitag, 21. April 2017 00:49
An: grass-user
Betreff: [GRASS-user] Aggregation of massive number of raster layers
with
r.series
Hi,
I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).
I am doing the aggregation using r.series on our cluster running
grass
7.2, but of course it takes ages (21% there after 3 days).
-
I tried to tile the process, but it doesn’t seem to help much.
-
Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.
-
Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?
Cheers,
Pierre
grass-user mailing list
grass-user@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-user