[GRASS-dev] creating temporary mapsets in a parallelized python script

hellik · April 11, 2016, 8:04pm

Hi,

in a parallelized python script it is needed to do the parallelized jobs in
different mapsets as region settings are changed during calculations.

Any hints how to create temporary mapsets? Already existing pygrass /
pyscript functions?

thanks

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/creating-temporary-mapsets-in-a-parallelized-python-script-tp5260753.html
Sent from the Grass - Dev mailing list archive at Nabble.com.

annakrat · April 11, 2016, 8:44pm

On Mon, Apr 11, 2016 at 4:04 PM, Helmut Kudrnovsky <hellik@web.de> wrote:

Hi,

in a parallelized python script it is needed to do the parallelized jobs in
different mapsets as region settings are changed during calculations.

are you sure you need separate mapsets? You can also just use this:

env = os.environ.copy()
env['GRASS_REGION'] = grass.region_env(raster='elevation')

gscript.run_command('r.viewshed', ..., env=env)

Anna

Any hints how to create temporary mapsets? Already existing pygrass /
pyscript functions?

thanks

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/creating-temporary-mapsets-in-a-parallelized-python-script-tp5260753.html
Sent from the Grass - Dev mailing list archive at Nabble.com.
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

hellik · April 11, 2016, 8:46pm

Anna Petrášová wrote

On Mon, Apr 11, 2016 at 4:04 PM, Helmut Kudrnovsky <

hellik@

> wrote:

Hi,

in a parallelized python script it is needed to do the parallelized jobs
in
different mapsets as region settings are changed during calculations.

are you sure you need separate mapsets? You can also just use this:

env = os.environ.copy()
env['GRASS_REGION'] = grass.region_env(raster='elevation')

gscript.run_command('r.viewshed', ..., env=env)

Anna

Any hints how to create temporary mapsets? Already existing pygrass /
pyscript functions?

the idea of the script is to run (1) by python's multiprocessing for more
than 600 points.

IIRC (1) changes region during calculations, so several calculations may
interfere in regions settings.

(1)
https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.basin/r.basin.py

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/creating-temporary-mapsets-in-a-parallelized-python-script-tp5260753p5260758.html
Sent from the Grass - Dev mailing list archive at Nabble.com.

annakrat · April 11, 2016, 9:09pm

On Mon, Apr 11, 2016 at 4:46 PM, Helmut Kudrnovsky <hellik@web.de> wrote:

Anna Petrášová wrote

On Mon, Apr 11, 2016 at 4:04 PM, Helmut Kudrnovsky <

hellik@

> wrote:

Hi,

in a parallelized python script it is needed to do the parallelized jobs
in
different mapsets as region settings are changed during calculations.

are you sure you need separate mapsets? You can also just use this:

env = os.environ.copy()
env['GRASS_REGION'] = grass.region_env(raster='elevation')

gscript.run_command('r.viewshed', ..., env=env)

Anna

Any hints how to create temporary mapsets? Already existing pygrass /
pyscript functions?

the idea of the script is to run (1) by python's multiprocessing for more
than 600 points.

IIRC (1) changes region during calculations, so several calculations may
interfere in regions settings.

I see, then it's easier to fix the script and use grass.use_temp_region

(1)
https://trac.osgeo.org/grass/browser/grass-addons/grass7/raster/r.basin/r.basin.py

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/creating-temporary-mapsets-in-a-parallelized-python-script-tp5260753p5260758.html
Sent from the Grass - Dev mailing list archive at Nabble.com.
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

hellik · April 11, 2016, 9:19pm

Anna Petrášová wrote

On Mon, Apr 11, 2016 at 4:46 PM, Helmut Kudrnovsky <

hellik@

> wrote:

Anna Petrášová wrote

On Mon, Apr 11, 2016 at 4:04 PM, Helmut Kudrnovsky <

hellik@

> wrote:

Hi,

in a parallelized python script it is needed to do the parallelized
jobs
in
different mapsets as region settings are changed during calculations.

are you sure you need separate mapsets? You can also just use this:

env = os.environ.copy()
env['GRASS_REGION'] = grass.region_env(raster='elevation')

gscript.run_command('r.viewshed', ..., env=env)

Anna

Any hints how to create temporary mapsets? Already existing pygrass /
pyscript functions?

the idea of the script is to run (1) by python's multiprocessing for more
than 600 points.

IIRC (1) changes region during calculations, so several calculations may
interfere in regions settings.

I see, then it's easier to fix the script and use grass.use_temp_region

From the manual:

script.core.use_temp_region()[source]¶
Copies the current region to a temporary region with “g.region save=”, then
sets WIND_OVERRIDE to refer to that region. Installs an atexit handler to
delete the temporary region upon termination.

Does this function mean that: while concurrent r.basin runs on different
CPUs in the same mapset, there is no interefering by region settings change
interefering?

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/creating-temporary-mapsets-in-a-parallelized-python-script-tp5260753p5260765.html
Sent from the Grass - Dev mailing list archive at Nabble.com.

wenzeslaus · April 11, 2016, 10:13pm

On Mon, Apr 11, 2016 at 5:19 PM, Helmut Kudrnovsky <hellik@web.de> wrote:

> I see, then it's easier to fix the script and use grass.use_temp_region

From the manual:

script.core.use_temp_region()[source]¶
Copies the current region to a temporary region with “g.region save=”, then
sets WIND_OVERRIDE to refer to that region. Installs an atexit handler to
delete the temporary region upon termination.

Does this function mean that: while concurrent r.basin runs on different
CPUs in the same mapset, there is no interefering by region settings change
interefering?

Setting WIND_OVERRIDE (environmental variable) will happen for the current
process, so it won't affect other processes only subprocess which is
desired. (In your case, this won't work if you would use this function
outside of r.basin.) So yes, parallel processes (including g.region) won't
see WIND_OVERRIDE, so they will see and use the original computational
region.

The documentation describes just what happens in the background but not the
usage. Feel free to update that with your understanding.

Generally, use_temp_region should be used by any module which calls
g.region.

Vaclav

hellik · April 12, 2016, 9:29am

wenzeslaus wrote

On Mon, Apr 11, 2016 at 5:19 PM, Helmut Kudrnovsky <

hellik@

> wrote:

> I see, then it's easier to fix the script and use grass.use_temp_region

From the manual:

script.core.use_temp_region()[source]¶
Copies the current region to a temporary region with “g.region save=”,
then
sets WIND_OVERRIDE to refer to that region. Installs an atexit handler to
delete the temporary region upon termination.

Does this function mean that: while concurrent r.basin runs on different
CPUs in the same mapset, there is no interefering by region settings
change
interefering?

Setting WIND_OVERRIDE (environmental variable) will happen for the current
process, so it won't affect other processes only subprocess which is
desired. (In your case, this won't work if you would use this function
outside of r.basin.) So yes, parallel processes (including g.region) won't
see WIND_OVERRIDE, so they will see and use the original computational
region.

The documentation describes just what happens in the background but not
the
usage. Feel free to update that with your understanding.

Generally, use_temp_region should be used by any module which calls
g.region.

Vaclav

thanks for the hints and clarifications.

are there examples of the correct use of script.core.use_temp_region()
around?

I've found it in [1]:

124 # clone current region
125 grass.use_temp_region()
126
127 grass.run_command('g.region', res=panres, align=pan)

anything else to do?

[1]
https://trac.osgeo.org/grass/browser/grass/trunk/scripts/i.pansharpen/i.pansharpen.py#L124

thanks

-----
best regards
Helmut
--
View this message in context: http://osgeo-org.1560.x6.nabble.com/creating-temporary-mapsets-in-a-parallelized-python-script-tp5260753p5260826.html
Sent from the Grass - Dev mailing list archive at Nabble.com.

mlennert · April 12, 2016, 9:52am

On 12/04/16 11:29, Helmut Kudrnovsky wrote:

wenzeslaus wrote

On Mon, Apr 11, 2016 at 5:19 PM, Helmut Kudrnovsky <

hellik@

> wrote:

I see, then it's easier to fix the script and use grass.use_temp_region

From the manual:

script.core.use_temp_region()[source]¶
Copies the current region to a temporary region with “g.region save=”,
then
sets WIND_OVERRIDE to refer to that region. Installs an atexit handler to
delete the temporary region upon termination.

Does this function mean that: while concurrent r.basin runs on different
CPUs in the same mapset, there is no interefering by region settings
change
interefering?

Setting WIND_OVERRIDE (environmental variable) will happen for the current
process, so it won't affect other processes only subprocess which is
desired. (In your case, this won't work if you would use this function
outside of r.basin.) So yes, parallel processes (including g.region) won't
see WIND_OVERRIDE, so they will see and use the original computational
region.

The documentation describes just what happens in the background but not
the
usage. Feel free to update that with your understanding.

Generally, use_temp_region should be used by any module which calls
g.region.

Vaclav

thanks for the hints and clarifications.

are there examples of the correct use of script.core.use_temp_region()
around?

https://grasswiki.osgeo.org/wiki/GRASS_Python_Scripting_Library#Using_temporary_region_for_computations

Moritz

Glynn_Clements1 · April 14, 2016, 11:15am

Vaclav Petras wrote:

> Does this function mean that: while concurrent r.basin runs on different
> CPUs in the same mapset, there is no interefering by region settings change
> interefering?

Setting WIND_OVERRIDE (environmental variable) will happen for the current
process, so it won't affect other processes only subprocess which is
desired. (In your case, this won't work if you would use this function
outside of r.basin.) So yes, parallel processes (including g.region) won't
see WIND_OVERRIDE, so they will see and use the original computational
region.

The environment will be inherited by any child processes, so a
top-level script can use the same approach (but not use_temp_region()
itself) to use a different named region for each subprocess.

The main issue with use_temp_region() is that the clean-up function
uses the current value of the environment variable to determine which
region to delete.

This could be fixed by passing a lambda to atexit.register(), e.g.

def use_temp_region():
    name = "tmp.%s.%d" % (os.path.basename(sys.argv[0]), os.getpid())
    run_command("g.region", save=name, overwrite=True)
    os.environ['WIND_OVERRIDE'] = name
    atexit.register(lambda: run_command("g.remove", flags='f', quiet=True, type='region', name=name))

--
Glynn Clements <glynn@gclements.plus.com>

Pietro2 · April 14, 2016, 12:39pm

The main issue with use_temp_region() is that the clean-up function
uses the current value of the environment variable to determine which
region to delete.

This could be fixed by passing a lambda to atexit.register(), e.g.

def use_temp_region():
    name = "tmp.%s.%d" % (os.path.basename(sys.argv[0]), os.getpid())
    run_command("g.region", save=name, overwrite=True)
    os.environ['WIND_OVERRIDE'] = name
    atexit.register(lambda: run_command("g.remove", flags='f', quiet=True, type='region', name=name))

Perhaps we could decorate the function with contextmanager:

{{{
from contextlib import contextmanager

@contextmanager
def use_temp_region(**reg):
    name = "tmp.%s.%d" % (os.path.basename(sys.argv[0]), os.getpid())
    original = region()
    try:
        reg = parse_command("g.region", save=name, overwrite=True, **reg)
        os.environ['WIND_OVERRIDE'] = name
        yield reg
    finally:
        # clean created variable and region
        print('WIND_OVERRIDE' in os.environ)
        os.environ.pop('WIND_OVERRIDE')
        run_command("g.remove", flags='f', quiet=True, type='region', name=name)
        # restore previous region
        for key in 'projection,zone,cells'.split(','):
            original.pop(key)
        run_command("g.region", **original)
}}}

and then use the function with:

{{{
run_command("g.region", flags='p')
print('=' * 30)

with use_temp_region(res=100) as tmp_region:
run_command("g.region", flags='p')
print('=' * 30)

run_command("g.region", flags='p')
print('=' * 30)
}}}

All the best

Pietro

Glynn_Clements1 · April 21, 2016, 12:32pm

Pietro wrote:

Perhaps we could decorate the function with contextmanager:

That's an option, but it complicates matters in the (probably more
common) case where you just want to use a single temporary region for
the entire script.

--
Glynn Clements <glynn@gclements.plus.com>