[GRASS-user] Multiple `r.proj` requests on the same raster map(s)

Thanks a lot Markus.

I didn't mean to "hide" details. It's the lack of time for my bad
example. I posted the "pseudo-example" to get more or less, a
verification, about the "logic" of it.

My intention was/is to share scripts after polishing and corrections.

Attached is only 1 out of many. They live in a private gitlab
repository. I have to wait before I can share.
I could adjust and bring my custom functions to do exactly the
"trick"/logic you explain(ed), with a unique temporary Mapset.

Finally, I think the `.gislock` files I find, are liekely and mostly due
to previous failed attempts. My current solution for them is to simply
re-run my final processes after cleaning these files (either simply
entering and exiting in the corresponding Mapsets, or forcing grass via
-f).

I think that filling-in the unique mapset logic, will do away with 99%
of the troubles.

Nikos

Mixed authors:

--%<---

Just a pseudo-example: it would suffice then to,

save current region

does not work, you should be still outside GRASS

for loop over something
   ...

   CURRENT_MAPSET=$(g.mapset -p)

does not work, you should be still outside GRASS

   # a temporary Mapset
   RANDOM_STRING=$(mktemp --dry-run |cut -d"." -f2)

now you start GRASS ...

   grass -c $RANDOM_STRING

missing is --exec

... and are out of GRASS again

put this in a script to be called with grass -c ... --exec myscript.sh
-->

   # do something
   r.mask vector=VectorMap where="Attribute='Here'" &&
   g.region zoom=MASK &&
   r.zonal.stats cover=covermap base=basemap method=average

output=outputmap

   # back to "valid" Mapset
   g.mapset $CURRENT_MAPSET

   g.copy raster=outputmap@${RANDOM_STRING},outputmap
   r.stats -acp in=outputmap out=report
   r.mask -r

<--

restore region

does not work, you should be again outside GRASS

All outside GRASS, try to

1. create a script with commands and parameters to be executed, e.g. myscript.sh
2. create a unique name of a temporary mapset (full path), store it in an env var, e.g. TMPMAPSET
3. run grass -c $TMPMAPSET --exec myscript.sh
4. remove the temporary mapset simply with rm -fr $TMPMAPSET

Alternatively/additionally, don't use the script grassXY to start a GRASS
session, instead define the GRASS environment with custom scripts (one for
the GRASS version to use, one for the database/location/mapset to use). This
avoids race conditions on a HPC system. A unique temporary mapset for each
process helps to avoid all sorts of concurrent access problems.

It mostly works for me with --exec. Mostly. That is, there are missing
or empty WIND files, here and there, and .gislock related issues.

--->%---

(attachments)

i.landsat8.swlst.helper_functions.sh (6.61 KB)

On Sun, Feb 25, 2018 at 6:34 PM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

Thanks a lot Markus.

I didn’t mean to “hide” details.

Details are not so important here, and it is easy to get lost in details. The general workflow and design on how to run GRASS processing chains in parallel is more important. Most important is how a “sandbox” environment for each job is created, i.e. unique mapset, unique GISRC file, maybe also a unique temporary directory. This “sandbox” environment must be created outside GRASS, then start the processing in a unique mapset with e.g. grassXY -c /path/to/unique/mapset --exec /path/to/script, and after GRASS finished, clean up, removing the temporary mapset and any temporary data.

Therefore it does not really help if you provide a script, more helpful would be a description on how a script is launched for parallel processing.

Markus M

It’s the lack of time for my bad
example. I posted the “pseudo-example” to get more or less, a
verification, about the “logic” of it.

My intention was/is to share scripts after polishing and corrections.

Attached is only 1 out of many. They live in a private gitlab
repository. I have to wait before I can share.
I could adjust and bring my custom functions to do exactly the
“trick”/logic you explain(ed), with a unique temporary Mapset.

Finally, I think the .gislock files I find, are liekely and mostly due
to previous failed attempts. My current solution for them is to simply
re-run my final processes after cleaning these files (either simply
entering and exiting in the corresponding Mapsets, or forcing grass via
-f).

I think that filling-in the unique mapset logic, will do away with 99%
of the troubles.

Nikos

Mixed authors:

–%<—

Just a pseudo-example: it would suffice then to,

save current region

does not work, you should be still outside GRASS

for loop over something

CURRENT_MAPSET=$(g.mapset -p)

does not work, you should be still outside GRASS

a temporary Mapset

RANDOM_STRING=$(mktemp --dry-run |cut -d"." -f2)

now you start GRASS …

grass -c $RANDOM_STRING

missing is --exec

… and are out of GRASS again

put this in a script to be called with grass -c … --exec myscript.sh

do something

r.mask vector=VectorMap where=“Attribute=‘Here’” &&
g.region zoom=MASK &&
r.zonal.stats cover=covermap base=basemap method=average

output=outputmap

back to “valid” Mapset

g.mapset $CURRENT_MAPSET

g.copy raster=outputmap@${RANDOM_STRING},outputmap
r.stats -acp in=outputmap out=report
r.mask -r

restore region

does not work, you should be again outside GRASS

All outside GRASS, try to

  1. create a script with commands and parameters to be executed, e.g. myscript.sh
  2. create a unique name of a temporary mapset (full path), store it in an env var, e.g. TMPMAPSET
  3. run grass -c $TMPMAPSET --exec myscript.sh
  4. remove the temporary mapset simply with rm -fr $TMPMAPSET

Alternatively/additionally, don’t use the script grassXY to start a GRASS
session, instead define the GRASS environment with custom scripts (one for
the GRASS version to use, one for the database/location/mapset to use). This
avoids race conditions on a HPC system. A unique temporary mapset for each
process helps to avoid all sorts of concurrent access problems.

It mostly works for me with --exec. Mostly. That is, there are missing
or empty WIND files, here and there, and .gislock related issues.

—>%—

* Markus Metz <markus.metz.giswork@gmail.com> [2018-02-27 09:08:03 +0100]:

On Sun, Feb 25, 2018 at 6:34 PM, Nikos Alexandris <nik@nikosalexandris.net>
wrote:

Thanks a lot Markus.

I didn't mean to "hide" details.

Details are not so important here, and it is easy to get lost in details.
The general workflow and design on how to run GRASS processing chains in
parallel is more important. Most important is how a "sandbox" environment
for each job is created, i.e. unique mapset, unique GISRC file, maybe also
a unique temporary directory. This "sandbox" environment must be created
outside GRASS, then start the processing in a unique mapset with e.g.
grassXY -c /path/to/unique/mapset --exec /path/to/script, and after GRASS
finished, clean up, removing the temporary mapset and any temporary data.

Therefore it does not really help if you provide a script, more helpful
would be a description on how a script is launched for parallel processing.

Markus M

Understood. I have put pieces together as a
workflow (as implied in
http://osgeo-org.1560.x6.nabble.com/Does-multi-threading-apply-to-r-series-lwr-td5341184.html)
before starting the real work. Once time permits, I'll come back in this
(or the other) thread with what I have worked out.

Thanks a million, Nikos