[GRASS-dev] [GRASS GIS] #2230: 'g.region -p' writes new WIND file, causes race condition for parallel jobs

#2230: 'g.region -p' writes new WIND file, causes race condition for parallel jobs
----------------------+-----------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 6.4.4
Component: Default | Version: svn-develbranch6
Keywords: g.region | Platform: Linux
      Cpu: x86-64 |
----------------------+-----------------------------------------------------
Hi,

In tracking down some strangeness I've just noticed that 'g.region -p' or
'g.region -g' writes out a new (unchanged) WIND file whenever it runs.

This is a problem when you have many jobs working in parallel and a bunch
of them starting at the same time all call 'eval `g.region -g`' at the
start of their scripts. About 5% of the time I'm hitting the file before
it is completely written and the unfortunate g.region G_fatal_error()s
with a missing projection parameter or similar.

one solution is to not open the WIND file for writing if just read-only
command line terms are given, another is to write to a .tmp file and move
it into place atomically.

Hamish

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2230&gt;
GRASS GIS <http://grass.osgeo.org>

#2230: 'g.region -p' writes new WIND file, causes race condition for parallel jobs
----------------------+-----------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 6.4.4
Component: Default | Version: svn-develbranch6
Keywords: g.region | Platform: Linux
      Cpu: x86-64 |
----------------------+-----------------------------------------------------

Comment(by neteler):

Replying to [ticket:2230 hamish]:
> Hi,
>
> In tracking down some strangeness I've just noticed that 'g.region -p'
or
> 'g.region -g' writes out a new (unchanged) WIND file whenever it runs.

> one solution is to not open the WIND file for writing if just read-only
> command line terms are given

IMHO yes: read-only flags should behave like as such. Otherwise race
conditions
may occur.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2230#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2230: 'g.region -p' writes new WIND file, causes race condition for parallel jobs
----------------------+-----------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 6.4.4
Component: Default | Version: svn-develbranch6
Keywords: g.region | Platform: Linux
      Cpu: x86-64 |
----------------------+-----------------------------------------------------

Comment(by glynn):

Replying to [ticket:2230 hamish]:

> In tracking down some strangeness I've just noticed that 'g.region -p'
or 'g.region -g' writes out a new (unchanged) WIND file whenever it runs.

It does this whenever it runs without the -u flag.

One problem is that printing and setting are not mutually exclusive, i.e.
you can both modify and print the current region. For example, the command
{{{
g.region -p res=10
}}}
will set the resolution to 10 units, write the modified region to the WIND
file (or the region named by $WIND_OVERRIDE), and print the updated
region.

Other problems are:

  * The region is adjusted (by G_adjust_Cell_head3) before writing it, so
the WIND file may actually be changed by running g.region with no
arguments (or just with -p/-g).

  * The region may not even be read from the WIND file (or the region named
by $WIND_OVERRIDE). If GRASS_REGION is set, the region will be read from
that variable, but writes still go to WIND or $WIND_OVERRIDE.

In short, simply skipping the G_put_window() call based upon which
arguments are given (other than -u) would constitute a change to existing
behaviour. You can't assume that the updated region will be identical to
the previous region based solely upon which arguments are given.

I think that the issue is essentially that people don't realise that
g.region always sets the region unless -u is given. Even without any
options which explicitly modify the region, it uses $GRASS_REGION if that
is set, and adjusts the region before writing it. IOW, "g.region -p"
doesn't "print the region", it "sets and prints the region"; "g.region
-pu" prints (and only prints) the region.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2230#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2230: 'g.region -p' writes new WIND file, causes race condition for parallel jobs
----------------------+-----------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 6.4.4
Component: Default | Version: svn-develbranch6
Keywords: g.region | Platform: Linux
      Cpu: x86-64 |
----------------------+-----------------------------------------------------

Comment(by wenzeslaus):

So, looking at r59383, r59386, r59387 and r59391, what is the set of
`g.region` flags which should be used by default when scripting. If the
region (WIND file) cannot be changed by default, why `g.region` is
changing it? If the region (WIND file) needs to be changed by default, we
should put a big warning to documentation about scripting (which can be
always parallelized).

> I think that the issue is essentially that people don't realise that
g.region always sets the region unless -u is given.

Again, as in previous paragraph, this is a documentation issue. This is
not expected behavior and it is not properly documented.

Also I don't understand why it is necessary to write the WIND file in
`g.region` if the content was not changed.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2230#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2230: 'g.region -p' writes new WIND file, causes race condition for parallel jobs
----------------------+-----------------------------------------------------
Reporter: hamish | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 6.4.4
Component: Default | Version: svn-develbranch6
Keywords: g.region | Platform: Linux
      Cpu: x86-64 |
----------------------+-----------------------------------------------------

Comment(by glynn):

Replying to [comment:3 wenzeslaus]:
> what is the set of `g.region` flags which should be used by default when
scripting.

If you only want to read the region, use the "-u" flag.

> This is not expected behavior and it is not properly documented.

Agreed; I overlooked it when writing the grass.script.core Python module.

> Also I don't understand why it is necessary to write the WIND file in
`g.region` if the content was not changed.

It's not simple to tell whether the content was changed, i.e. whether the
region which is printed will match the contents of the WIND file.

When the region is changed explicitly (via the -d flag or any option other
than save=), it's reasonable to assume that the updated region doesn't
match the WIND file. But the absence of those options doesn't necessarily
mean that writing to WIND (or $WIND_OVERRIDE) will be a no-op.

If the behaviour was changed so that WIND was only written when one of
those options was used, that would technically be a change to the long-
standing behaviour, but we would probably get away with it. Probably.
Maybe that's close enough; I have no particular opinion on that.

However: we'd need to add a -f (force) flag, otherwise there'd be no way
to copy the $GRASS_REGION value to WIND/$WIND_OVERRIDE, or to perform just
the adjustment (G_adjust_Cell_head3).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2230#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>