[GRASS-dev] G_tempfile() proposed changes [relevant to creating a new location]

Paul wrote:

I've had the attached changes (to G_tempfile(), g.tempfile and a couple of
other places) sitting around for a while, but Michael's comment about changes
to g.proj being forthcoming reminded me to hurry up and put them forward for
discussion. The changes amount to:
* Re-naming the current G_tempfile() to G_tempfile_in_mapset().
* Change the calls to G_tempfile() in lib/gis/opencell.c to use the new
  function G_tempfile_in_mapset()
* Write a new G_tempfile() that puts a tempfile in the directory pointed to
  by the TEMP environment variable if it exists (this will be somewhere
  writeable by the user on Windows) or P_tmpdir otherwise (/tmp on Unix or
  root of current drive on Windows).
* Add a new command-line flag to g.tempfile to allow it to create a tempfile
  in the current mapset if necessary.
* Change r.in.aster (as an example) to use g.tempfile in this way. There
  would be more examples I'm sure we could find that want to create a large
  tempfile somewhere there's more than likely to be enough space - the
  original purpose of G_tempfile, according to comments in the source.
  Although perhaps that's not all that relevant. Perhaps in the past /tmp
  might have been likely to be very small?

depends on how the user partitioned their drive. It's common to have /tmp in a
small partition to prevent out of disk space DoS attacks.

</gone>

Hi, & a happy 2007,

I have a few problems/questions with this.

Such a big change to the way things are done requires an extrodinary reason.
What's the need? [sorry not well set up to pour through the archives here]
Is is just for creating new locations? (a single case when $MAPSET doesn't
exist yet)

Instead of changing G_tempfile() to G_tempfile_in_mapset(), why not leave
G_tempfile() as-is and make a new G_tempfile_in_tmp() [or similar name] for the
few cases that need that? Patches which do not address fundamental flaws in the
system should follow the path of least damage. (patches which *do* address
funamental flaws are of course another matter.)

The reason I haven't applied the changes yet is that I'm slightly worried
about what the effect might be of the sudden change. In particular, that
tempfiles created by the current G_tempfile (i.e. in the current mapset) get
cleaned up at the end of the session and after the change they won't. Should
we just change it and fix things on a case-by-case basis? (Either change
them to create the tempfile in the mapset, or force them to delete it?) A
bit of a dilemma.

if in /tmp/ shouldn't they live in /tmp/grass-$USER-$$/, which gets removed
when grass exits? [nb the /tmp/grass-$USER rmdir currently buggy if creating
new locn or if you "Cancel" on the startup screen]

As noted in an earlier post, there are cases (debugging, d.text, d.graph) where
it is useful to leave small tmp files around for the rest of the session.

While we are on the subject, still missing: G_tempdir() and "g.tempfile -d"

Hamish

(note heavy lack of sympathy for any change that messes up the grass code to
accomodate broken/quirky {Windows|3rd party} compatibility software.)

ps - don't mess with "g.region -g"
pps- ps.map verbose command IS used in the wild, so don't break scripts by
removing it altogether. Note it is multi-leveled, so it should keep at least 3
levels of verbosity triggered by --q,<>,--v, i.e. standard messages should use
G_message(), but very verbose messages should hide G_message() in if(verbose >=
G_max_verbose()) or whatever. I am in favour of separating mapping commands
from program control, and letting the parser deal with the latter.

<gone>

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Hamish wrote:

Instead of changing G_tempfile() to G_tempfile_in_mapset(), why not leave
G_tempfile() as-is and make a new G_tempfile_in_tmp() [or similar name] for the
few cases that need that?

Because *most* usage of G_tempfile() doesn't need the file to reside
in the mapset directory. It's the existing behaviour which is a
special case.

Essentially, there's no reason for the two (mapset directory and
temporary directory) to be linked. Having them linked creates problems
if you need to create a temporary file (e.g. for G_asprintf()) when
you don't have a valid mapset, meaning that many libgis functions may
require a valid mapset for no good reason.

As noted in an earlier post, there are cases (debugging, d.text, d.graph) where
it is useful to leave small tmp files around for the rest of the session.

Right. However:

1. Such files don't need to persist between sessions.

2. If a module exits normally, we can assume that any temporary files
which it hasn't removed are meant to persist for the remainder of the
session.

3. If a module exits abnormally using G_fatal_error(), we may want to
automatically remove any temporary files which it created (the only
reason not to would be for debugging, and we can provide an
environment variable for this purpose).

For #3, it would help if we used a directory which was specific to
GRASS (e.g. /tmp/grass-<user>-<session-pid> rather than /tmp) so that
we don't have to worry about deleting something we shouldn't.

--
Glynn Clements <glynn@gclements.plus.com>

On Sun, 7 Jan 2007, Glynn Clements wrote:

Hamish wrote:

Instead of changing G_tempfile() to G_tempfile_in_mapset(), why not leave
G_tempfile() as-is and make a new G_tempfile_in_tmp() [or similar name] for the
few cases that need that?

Because *most* usage of G_tempfile() doesn't need the file to reside
in the mapset directory. It's the existing behaviour which is a
special case.

Essentially, there's no reason for the two (mapset directory and
temporary directory) to be linked. Having them linked creates problems
if you need to create a temporary file (e.g. for G_asprintf()) when
you don't have a valid mapset, meaning that many libgis functions may
require a valid mapset for no good reason.

Just to clarify, the special case here was running g.proj to create a new location when there aren't any locations existing. Any library functions called by g.proj that use G_tempfile will fail because there is no current mapset to put the tempfile into. But there are other issues that have come up from time to time, if you search for G_tempfile on Google. It is kind of a recurring nagging problem, more a usage problem than a problem with the function itself I think. The documentation for G_tempfile says it is supposed to be used for large temporary files like maps, where the system temp directory might not have enough space. But in practice it's come to be used whenever a temporary file is needed for anything, and it's not always suitable.

I feel if we don't make a wholesale change now it might lead to hard-to-detect little problems in years to come when everyone forgets about this and again assumes G_tempfile creates a "normal" temp file - It reminds me of the G_getl issue. I almost think it would have been better if G_getl had been re-written to support stripping Mac/DOS new line sequences as well as Unix ones instead of adding a new G_getl2 and changing case-by-case in modules. The newline problem causes loads of subtle problems on native Windows and it is often really hard to detect where, deep within a library that a call to G_getl should be replaced with G_getl2 to get round the problem.

As noted in an earlier post, there are cases (debugging, d.text, d.graph) where
it is useful to leave small tmp files around for the rest of the session.

Right. However:

1. Such files don't need to persist between sessions.

2. If a module exits normally, we can assume that any temporary files
which it hasn't removed are meant to persist for the remainder of the
session.

3. If a module exits abnormally using G_fatal_error(), we may want to
automatically remove any temporary files which it created (the only
reason not to would be for debugging, and we can provide an
environment variable for this purpose).

For #3, it would help if we used a directory which was specific to
GRASS (e.g. /tmp/grass-<user>-<session-pid> rather than /tmp) so that
we don't have to worry about deleting something we shouldn't.

The issue here I think is (IIRC) that session directory is only created when you start a GRASS session from Init.sh, right? So if some external program is calling GRASS modules, e.g. QGIS or something (not 100% sure how it works) it would have been OK up to now with just a few environment variables set - if we make this change there will have to be an official GRASS "session" for G_tempfile to work. Unless I've missed something.
But yes I can't see any other way of easily deleting the temp files without accidentally deleting something else.

Paul Kelly wrote:

> For #3, it would help if we used a directory which was specific to
> GRASS (e.g. /tmp/grass-<user>-<session-pid> rather than /tmp) so that
> we don't have to worry about deleting something we shouldn't.

The issue here I think is (IIRC) that session directory is only created
when you start a GRASS session from Init.sh, right? So if some external
program is calling GRASS modules, e.g. QGIS or something (not 100% sure
how it works) it would have been OK up to now with just a few environment
variables set - if we make this change there will have to be an official
GRASS "session" for G_tempfile to work. Unless I've missed something.

G_tempfile() could always create the directory itself if it doesn't
already exist. $GIS_LOCK would still need to be set in order to have a
directory which is specific to a particular session.

But yes I can't see any other way of easily deleting the temp files
without accidentally deleting something else.

It has since occurred to me that we could still run into problems due
to PID re-use. Temporary files whose names contain the PID of the
current process could have been created by a previous process with the
same PID with the intent that they persist. The only 100% robust
solution is to store "persistent temporary files" (is that an
oxymoron?) in a different directory to "transient" temporary files.

--
Glynn Clements <glynn@gclements.plus.com>