[GRASS-dev] r65348: Mapset's tmp dir placement and broken element search

Moving the conversation from the ticket.

On Sat, Jun 6, 2015 at 9:16 AM, GRASS GIS <trac@osgeo.org> wrote:

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
--------------------------±------------------------------------------
Reporter: neteler | Owner: grass-dev@…
Type: enhancement | Status: new
Priority: critical | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: compression, r.compress, null
CPU: Unspecified | Platform: All
--------------------------±------------------------------------------

Comment (by martinl):

Replying to [comment:34 glynn]:

Replying to [comment:33 neteler]:

I added some support for the improved r.null/r.compress functionality to
lib/raster, but it was amongst the things broken by r65348 (the tempfile
changes), so this needs to wait until that gets fixed or reverted.

by “fix” you mean to ignore GRASS_TMPDIR_MAPSET when working with raster
data?

Hi, it seems to me that there is some confusion here. AFAIK r65348 has two issues.

First, it allows unorthodox behavior of GRASS but this behavior is (or at least should be) applied only when GRASS_TMPDIR_MAPSET is set to some specific value. I don’t see much problem with it as long as the documentation properly discuss all the potential issues and the default is the standard and safe way. This was discussed in [GRASS-dev] r65348 (GRASS_TMPDIR_MAPSET).

Second, it breaks reading of header files (and perhaps other files) in some cases (issue reported in #2687). I really don’t know how this is related to tmp files (seems unrelated to me), but tests are showing that it happened between r65346 (OK) and r65349 (broken). When I created #2687, I tried to compile different versions manually and r65348 was when it got broken. Also r65347 is backslash cosmetics in grass.py and r65349 is in grass-addons. In first comment for #2687, Glynn explains that the picking of a wrong Mapset by r.info “is caused by r65348, which constructs the path before the mapset has been resolved.” This renders trunk almost unusable.

I hope this overview helps,

Vaclav

r65346 https://trac.osgeo.org/grass/changeset/65346
r65347 https://trac.osgeo.org/grass/changeset/65347
r65348 https://trac.osgeo.org/grass/changeset/65348
r65349 https://trac.osgeo.org/grass/changeset/65349
#2687 https://trac.osgeo.org/grass/ticket/2687
#2687 https://trac.osgeo.org/grass/ticket/2687#comment:1
#2349 https://trac.osgeo.org/grass/ticket/2349#comment:34

[GRASS-dev] r65348 (GRASS_TMPDIR_MAPSET)
http://lists.osgeo.org/pipermail/grass-dev/2015-June/075220.html
http://osgeo-org.1560.x6.nabble.com/r65348-GRASS-TMPDIR-MAPSET-td5208936.html
http://comments.gmane.org/gmane.comp.gis.grass.devel/64556

Vaclav Petras wrote:

First, it allows unorthodox behavior of GRASS but this behavior is (or at
least should be) applied only when GRASS_TMPDIR_MAPSET is set to some
specific value. I don't see much problem with it as long as the
documentation properly discuss all the potential issues and the default is
the standard and safe way. This was discussed in [GRASS-dev] r65348
(GRASS_TMPDIR_MAPSET).

There is no justification for the current behaviour. The temporary
cell/fcell and null files should always be created in the mapset's
temporary directory.

--
Glynn Clements <glynn@gclements.plus.com>

Hi,

2015-06-10 17:41 GMT+02:00 Glynn Clements <glynn@gclements.plus.com>:

Vaclav Petras wrote:

First, it allows unorthodox behavior of GRASS but this behavior is (or at
least should be) applied only when GRASS_TMPDIR_MAPSET is set to some
specific value. I don't see much problem with it as long as the
documentation properly discuss all the potential issues and the default is
the standard and safe way. This was discussed in [GRASS-dev] r65348
(GRASS_TMPDIR_MAPSET).

There is no justification for the current behaviour. The temporary
cell/fcell and null files should always be created in the mapset's
temporary directory.

I am not sure why it must be so on (if files are not on the same
filesystem, they can be copied), anyway I changed it in r65436.
GRASS_TMPDIR_MAPSET is accepted only by vector library. Martin

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa

2015-06-10 21:05 GMT+02:00 Martin Landa <landa.martin@gmail.com>:

[...]

I am not sure why it must be so on (if files are not on the same
filesystem, they can be copied), anyway I changed it in r65436.
GRASS_TMPDIR_MAPSET is accepted only by vector library. Martin

the question is whether to rename this variable to
GRASS_VECTOR_TMPDIR_MAPSET. Any opinion? Martin

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa

On Wed, Jun 10, 2015 at 3:05 PM, Martin Landa <landa.martin@gmail.com> wrote:

2015-06-10 17:41 GMT+02:00 Glynn Clements <glynn@gclements.plus.com>:

Vaclav Petras wrote:

First, it allows unorthodox behavior of GRASS but this behavior is (or at
least should be) applied only when GRASS_TMPDIR_MAPSET is set to some
specific value. I don’t see much problem with it as long as the
documentation properly discuss all the potential issues and the default is
the standard and safe way. This was discussed in [GRASS-dev] r65348
(GRASS_TMPDIR_MAPSET).

There is no justification for the current behaviour. The temporary
cell/fcell and null files should always be created in the mapset’s
temporary directory.

I am not sure why it must be so on (if files are not on the same
filesystem, they can be copied), anyway I changed it in r65436.
GRASS_TMPDIR_MAPSET is accepted only by vector library. Martin

I don’t understand why it should be OK for vectors when it is not OK for rasters.

Anyway, I think that non-atomicity during copy (instead of move) when requested (by GRASS_TMPDIR_MAPSET) is not much worse than anything else what one must deal with when using GRASS on some advanced configuration. For example, if you are writing to several vector maps with attributes in parallel, it will take forever (with SQLite). However, if you use PostgreSQL (or probably also DBF) as attribute backend, you get great performance. Does this mean that you cannot write vectors with attributes in parallel and perhaps also rasters to be on the safe side? No, it just means that you need to configure your environment according to what you are doing.

To be precise, the parallel processing actually is not advanced configuration. The advanced configuration is required to make it work. This means that you need advanced configuration to make not so advanced thing work. That’s quite unfortunate but this only supports my point that you need to understand how GRASS works with different configurations to get good (or even acceptable) performance.

I would like to understand what are the possible issues and how much worse are they, for example from what is default with SQLite as attribute backend and parallel processing.

Martin Landa wrote:

> There is no justification for the current behaviour. The temporary
> cell/fcell and null files should always be created in the mapset's
> temporary directory.

I am not sure why it must be so on (if files are not on the same
filesystem, they can be copied),

A copy is very different from rename(). It's not just about
performance for performance' sake, but for minimising the interval
when maps are corrupt (i.e. files exist in an inconsistent state).

anyway I changed it in r65436. GRASS_TMPDIR_MAPSET is accepted only
by vector library. Martin

Martin Landa wrote:

the question is whether to rename this variable to
GRASS_VECTOR_TMPDIR_MAPSET. Any opinion? Martin

Allowing "normal" tempfiles to be placed elsewhere is reasonable
enough. But G_tempfile() originally existed specifically for the
tempfile-and-rename idiom, and some code requires that behaviour
(there are many more cases which ought to use that but don't).

Given that "misuse" of G_tempfile() (i.e. using it like tmpnam()) is
now more common than its intended use, it wouldn't be unreasonable to
make G_tempfile() use the new behaviour and use a different name for
the previous behaviour. The only requirement is that some function
continues to provide the previous behaviour

--
Glynn Clements <glynn@gclements.plus.com>

Vaclav Petras wrote:

> I am not sure why it must be so on (if files are not on the same
> filesystem, they can be copied), anyway I changed it in r65436.
> GRASS_TMPDIR_MAPSET is accepted only by vector library. Martin

I don't understand why it should be OK for vectors when it is not OK for
rasters.

For rasters, it's a regression. The cell/fcell and null files used the
tempfile-and-rename idiom from at least GRASS 4.3 (the oldest version
I have) until the last week.

There are plenty of other files which could ideally use the same
mechanism, but as they're much smaller (and faster to write generally,
as they tend to be simply dumps of existing data), it's less of an
issue.

Anyway, I think that non-atomicity during copy (instead of move) when
requested (by GRASS_TMPDIR_MAPSET) is not much worse than anything else
what one must deal with when using GRASS on some advanced configuration.
For example, if you are writing to several vector maps with attributes in
parallel, it will take forever (with SQLite). However, if you use
PostgreSQL (or probably also DBF) as attribute backend, you get great
performance. Does this mean that you cannot write vectors with attributes
in parallel and perhaps also rasters to be on the safe side? No, it just
means that you need to configure your environment according to what you are
doing.

Database updates should be handled using transactions where available.
I don't know whether this is already the case; I'm not that familiar
with the vector or DBMI libraries.

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, Jun 11, 2015 at 10:11 AM, Glynn Clements <glynn@gclements.plus.com> wrote:

Anyway, I think that non-atomicity during copy (instead of move) when
requested (by GRASS_TMPDIR_MAPSET) is not much worse than anything else
what one must deal with when using GRASS on some advanced configuration.
For example, if you are writing to several vector maps with attributes in
parallel, it will take forever (with SQLite). However, if you use
PostgreSQL (or probably also DBF) as attribute backend, you get great
performance. Does this mean that you cannot write vectors with attributes
in parallel and perhaps also rasters to be on the safe side? No, it just
means that you need to configure your environment according to what you are
doing.

Database updates should be handled using transactions where available.
I don’t know whether this is already the case; I’m not that familiar
with the vector or DBMI libraries.

I should have said that I mean parallel calls of modules from Python/Bash script, e.g. r.to.vect or v.what.rast. I think transactions are not applicable in this case (can a transaction work across processes?). In case of one process, I’m not sure how transactions relate to parallel processing but they should be used as much as possible, that’s for sure.

Anyway, my point is that parallel calls to r.to.vect are terribly slow with default configuration and one must know that the solution is to change the attribute table backend to Postgres (or perhaps DBF where there is one database file per vector map AFAIK).

Similarly, in case of setting changing the Mapset’s tmp dir to something one some other file systems for performance reasons, one must be aware of the raster and vector maps being in inconsistent state for some time which if fine as long as I’m not reading them. Thus, I don’t see an issue with the alternative location of Mapset’s tmp directory (GRASS_TMPDIR_MAPSET).

I wonder if I can get the behavior even without GRASS_TMPDIR_MAPSET. Can I create .tmp in my Mapset as link to directory on some other file system? Or can I mount there some other directory?

Thanks for discussing this now,

Vaclav

Hi,

2015-06-11 16:59 GMT+02:00 Vaclav Petras <wenzeslaus@gmail.com>:

Anyway, my point is that parallel calls to r.to.vect are terribly slow with
default configuration and one must know that the solution is to change the
attribute table backend to Postgres (or perhaps DBF where there is one
database file per vector map AFAIK).

GRASS_VECTOR_TEMPORARY=move together with [1] could also help. Martin

[1] https://trac.osgeo.org/grass/browser/grass/trunk/lib/vector/Vlib/local_proto.h#L11

--
Martin Landa
http://geo.fsv.cvut.cz/gwiki/Landa
http://gismentors.cz/mentors/landa