[GRASS-dev] GRASS_BATCH_JOB and GISRCRC

I would like to submit the following change to facilitate the use
of GRASS on clusters/in parallel jobs:

Index: lib/init/init.sh

--- lib/init/init.sh (revision 33097)
+++ lib/init/init.sh (working copy)
@@ -159,7 +159,11 @@
export GIS_LOCK

# Set the global grassrc file
-GISRCRC="$HOME/.grassrc6"
+if [ -n "$GRASS_BATCH_JOB" ] ; then
+ GISRCRC="$HOME/.grassrc6.`uname -n`"
+else
+ GISRCRC="$HOME/.grassrc6"
+fi

# Set the session grassrc file
if [ "$MINGW" ] ; then

This change will render GISRCRC individual if running GRASS
in parallel on a series of machines.
For a "normal" user the behavior is as before.

Any objections?

Markus

Markus Neteler wrote:

I would like to submit the following change to facilitate the use
of GRASS on clusters/in parallel jobs:

Index: lib/init/init.sh

--- lib/init/init.sh (revision 33097)
+++ lib/init/init.sh (working copy)
@@ -159,7 +159,11 @@
export GIS_LOCK

# Set the global grassrc file
-GISRCRC="$HOME/.grassrc6"
+if [ -n "$GRASS_BATCH_JOB" ] ; then
+ GISRCRC="$HOME/.grassrc6.`uname -n`"
+else
+ GISRCRC="$HOME/.grassrc6"
+fi

# Set the session grassrc file
if [ "$MINGW" ] ; then

This change will render GISRCRC individual if running GRASS
in parallel on a series of machines.
For a "normal" user the behavior is as before.

Any objections?

It needs a fallback to use the normal ~/.grassrc6 if a host-specific
version doesn't exist.

Actually, I really think that any "advanced" use (and I think that
includes running batch jobs on a cluster) should just bypass Init.sh
altogether and set the environment variables itself.

Contrary to what might be assumed from the insane complexity of
Init.sh, manually configuring the GRASS environment can be done with
as little as:

  export GISBASE=/opt/grass-7.0.svn
  export GRASS_GNUPLOT='gnuplot -persist'
  export GRASS_WIDTH=640
  export GRASS_HEIGHT=480
  export GRASS_HTML_BROWSER=firefox
  export GRASS_PAGER=cat
  export GRASS_WISH=wish
  export GRASS_PYTHON=python
  export GRASS_MESSAGE_FORMAT=silent
  export GRASS_TRUECOLOR=TRUE
  export GRASS_TRANSPARENT=TRUE
  export GRASS_PNG_AUTO_WRITE=TRUE
  
  export PATH="$GISBASE/bin:$GISBASE/scripts:$PATH"
  export LD_LIBRARY_PATH="$GISBASE/lib"
  export GRASS_LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
  export PYTHONPATH="$GISBASE/etc/python:$PYTHONPATH"
  
  export GIS_LOCK=$$
  export GRASS_VERSION="7.0.svn"
  
  tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK
  export GISRC="$tmp/gisrc"
  mkdir "$tmp"
  cp ~/.grassrc6 "$GISRC"

[That is in a script sourced from my ~/.bash_profile; the only time
Init.sh ever gets run here is if I'm testing changes to Init.sh.]

And much of that is unnecessary.

These are only needed for display commands:

  export GRASS_WIDTH=640
  export GRASS_HEIGHT=480
  export GRASS_TRUECOLOR=TRUE
  export GRASS_TRANSPARENT=TRUE
  export GRASS_PNG_AUTO_WRITE=TRUE

The first three are now the default, and GRASS_PNG_AUTO_WRITE isn't
really useful for direct rendering.

GRASS_WISH and GRASS_HTML_BROWSER are only necessary for GUI
applications (not relevant to batch jobs).

GRASS_PYTHON is only used by Init.sh to run gis_set.py (g.gui just
uses "python", while scripts use "#!/usr/bin/env python").

GRASS_GNUPLOT is only used by i.spectral.

GRASS_PAGER is hardly used, and then mostly by programs which would
only be used interactively.

GRASS_MESSAGE_FORMAT is mostly used when writing to a TTY.

GRASS_VERSION is only used by GEM, AFAICT (g.version reports the
version information from config.h, recorded at compile-time).

GRASS_LD_LIBRARY_PATH is only needed for running commands via an
xterm, so not applicable to batch jobs or 7.x.

So, for batch jobs, you shouldn't need anything beyond e.g.:

  export GISBASE=/opt/grass-7.0.svn
  
  export PATH="$GISBASE/bin:$GISBASE/scripts:$PATH"
  export LD_LIBRARY_PATH="$GISBASE/lib"
  export PYTHONPATH="$GISBASE/etc/python:$PYTHONPATH"
  
  export GIS_LOCK=$$
  
  tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK
  export GISRC="$tmp/gisrc"
  mkdir "$tmp"
  cp ~/.grassrc6 "$GISRC"

If it wasn't for supporting multiple sessions, you could just put the
various environment settings into the global /etc/profile, along with:

  GISRC=$HOME/.grassrc7

And GRASS commands can then be used like any other command (i.e. not
restricted to a GRASS session).

--
Glynn Clements <glynn@gclements.plus.com>

On Wed, Aug 27, 2008 at 7:48 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

I would like to submit the following change to facilitate the use
of GRASS on clusters/in parallel jobs:

Index: lib/init/init.sh

--- lib/init/init.sh (revision 33097)
+++ lib/init/init.sh (working copy)
@@ -159,7 +159,11 @@
export GIS_LOCK

# Set the global grassrc file
-GISRCRC="$HOME/.grassrc6"
+if [ -n "$GRASS_BATCH_JOB" ] ; then
+ GISRCRC="$HOME/.grassrc6.`uname -n`"
+else
+ GISRCRC="$HOME/.grassrc6"
+fi

# Set the session grassrc file
if [ "$MINGW" ] ; then

This change will render GISRCRC individual if running GRASS
in parallel on a series of machines.
For a "normal" user the behavior is as before.

Any objections?

It needs a fallback to use the normal ~/.grassrc6 if a host-specific
version doesn't exist.

Your comment is not entirely clear to me.
You mean in case that `uname -n` doesn't return
a useful string?
Then $$ (PID) would do the job. too.

Actually, I really think that any "advanced" use (and I think that
includes running batch jobs on a cluster) should just bypass Init.sh
altogether and set the environment variables itself.

I generally agree but I am right now too lazy/overworked to change
all my tested scripts...

Contrary to what might be assumed from the insane complexity of
Init.sh, manually configuring the GRASS environment can be done with
as little as:

...

       tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK

...

       cp ~/.grassrc6 "$GISRC"

Both should be grass7 I guess. (I made those changes some
time ago).

...
[ useful comments omitted ]
...

GRASS_LD_LIBRARY_PATH is only needed for running commands via an
xterm, so not applicable to batch jobs or 7.x.

Would it matter for GDAL-GRASS plugin and such?

So, for batch jobs, you shouldn't need anything beyond e.g.:

       export GISBASE=/opt/grass-7.0.svn

       export PATH="$GISBASE/bin:$GISBASE/scripts:$PATH"
       export LD_LIBRARY_PATH="$GISBASE/lib"
       export PYTHONPATH="$GISBASE/etc/python:$PYTHONPATH"

       export GIS_LOCK=$$

       tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK

In this case:
  tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK

       export GISRC="$tmp/gisrc"
       mkdir "$tmp"
       cp ~/.grassrc6 "$GISRC"

In this case:
  cp ~/.grassrc7 "$GISRC"

If it wasn't for supporting multiple sessions, you could just put the
various environment settings into the global /etc/profile, along with:

       GISRC=$HOME/.grassrc7

And GRASS commands can then be used like any other command (i.e. not
restricted to a GRASS session).

Right.

Note: On the cluster structure I am using I have my HOME and from there the
jobs migrate to the various nodes (blades whatever). Then results are
copied back to my HOME (in this case each job has its own MAPSET).
A final batch job just copies to a common MAPSET to have all results
in one place. So I do need multiple sessions, I think.

Markus

Markus Neteler wrote:

>> I would like to submit the following change to facilitate the use
>> of GRASS on clusters/in parallel jobs:
>>
>>
>> Index: lib/init/init.sh
>> ===================================================================
>> --- lib/init/init.sh (revision 33097)
>> +++ lib/init/init.sh (working copy)
>> @@ -159,7 +159,11 @@
>> export GIS_LOCK
>>
>> # Set the global grassrc file
>> -GISRCRC="$HOME/.grassrc6"
>> +if [ -n "$GRASS_BATCH_JOB" ] ; then
>> + GISRCRC="$HOME/.grassrc6.`uname -n`"
>> +else
>> + GISRCRC="$HOME/.grassrc6"
>> +fi
>>
>> # Set the session grassrc file
>> if [ "$MINGW" ] ; then
>>
>>
>> This change will render GISRCRC individual if running GRASS
>> in parallel on a series of machines.
>> For a "normal" user the behavior is as before.
>>
>> Any objections?
>
> It needs a fallback to use the normal ~/.grassrc6 if a host-specific
> version doesn't exist.

Your comment is not entirely clear to me.
You mean in case that `uname -n` doesn't return
a useful string?

No, I mean in case you only have a ~/.grassrc6 file, and not a
~/.grassrc6.<hostname> file.

Then $$ (PID) would do the job. too.

If you used $$, you would need to be able to predict the PID which
grass64/Init.sh would get before you run it, so that you could name
the file appropriately.

Hmm. Are you confusing $GISRCRC and $GISRC? The former is the
persistent ~/.grassrc6 (etc) file, the latter is the per-session copy
which Init.sh puts into /tmp/grass6-<user>-pid and which is actually
used by the programs.

> Contrary to what might be assumed from the insane complexity of
> Init.sh, manually configuring the GRASS environment can be done with
> as little as:
>
...
> tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK
...
> cp ~/.grassrc6 "$GISRC"

Both should be grass7 I guess. (I made those changes some
time ago).

Ultimately it doesn't actually matter. GRASS only cares that $GISRC
refers to a file which contains the necessary settings.

> GRASS_LD_LIBRARY_PATH is only needed for running commands via an
> xterm, so not applicable to batch jobs or 7.x.

Would it matter for GDAL-GRASS plugin and such?

Nope. The only thing that uses GRASS_LD_LIBRARY_PATH is the
grass-run.sh script, which uses it to restore LD_LIBRARY_PATH (or
whatever it's called on particular system), in case it has been reset
due to running a setuid/setgid binary (which may include xterm).

grass-run.sh no longer exists in 7.x, as it doesn't use xterm.

--
Glynn Clements <glynn@gclements.plus.com>

On Thu, Aug 28, 2008 at 1:10 AM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

>> I would like to submit the following change to facilitate the use
>> of GRASS on clusters/in parallel jobs:
>>
>>
>> Index: lib/init/init.sh
>> ===================================================================
>> --- lib/init/init.sh (revision 33097)
>> +++ lib/init/init.sh (working copy)
>> @@ -159,7 +159,11 @@
>> export GIS_LOCK
>>
>> # Set the global grassrc file
>> -GISRCRC="$HOME/.grassrc6"
>> +if [ -n "$GRASS_BATCH_JOB" ] ; then
>> + GISRCRC="$HOME/.grassrc6.`uname -n`"
>> +else
>> + GISRCRC="$HOME/.grassrc6"
>> +fi
>>
>> # Set the session grassrc file
>> if [ "$MINGW" ] ; then
>>

...

Hmm. Are you confusing $GISRCRC and $GISRC? The former is the
persistent ~/.grassrc6 (etc) file, the latter is the per-session copy
which Init.sh puts into /tmp/grass6-<user>-pid and which is actually
used by the programs.

Yes. I meant ~/.grassrc6. Possibly I am lagging behind for some years
as I used GRASS 6.3.old there :). Now I am switching to GRASS 6.4 for my
cluster calculations and have to solve this problem.

So indeed I should modify
## use TMPDIR if it exists, otherwise /tmp
#tmp=${TMPDIR-/tmp}
#tmp="$tmp/grass6-$USER-$GIS_LOCK"
tmp=/tmp/grass6-$USER-$GIS_LOCK
(umask 077 && mkdir "$tmp") || {
    echo "Cannot create temporary directory! Exiting." 1>&2
    exit 1
}
GISRC="$tmp/gisrc"
export GISRC

Or no modifications needed at all? Mhh, probably I am really confusing things
and everything just works out of the box. Will test the next days.

Markus

Markus Neteler wrote:

>> >> I would like to submit the following change to facilitate the use
>> >> of GRASS on clusters/in parallel jobs:
>> >>
>> >>
>> >> Index: lib/init/init.sh
>> >> ===================================================================
>> >> --- lib/init/init.sh (revision 33097)
>> >> +++ lib/init/init.sh (working copy)
>> >> @@ -159,7 +159,11 @@
>> >> export GIS_LOCK
>> >>
>> >> # Set the global grassrc file
>> >> -GISRCRC="$HOME/.grassrc6"
>> >> +if [ -n "$GRASS_BATCH_JOB" ] ; then
>> >> + GISRCRC="$HOME/.grassrc6.`uname -n`"
>> >> +else
>> >> + GISRCRC="$HOME/.grassrc6"
>> >> +fi
>> >>
>> >> # Set the session grassrc file
>> >> if [ "$MINGW" ] ; then
>> >>
...
> Hmm. Are you confusing $GISRCRC and $GISRC? The former is the
> persistent ~/.grassrc6 (etc) file, the latter is the per-session copy
> which Init.sh puts into /tmp/grass6-<user>-pid and which is actually
> used by the programs.

Yes. I meant ~/.grassrc6. Possibly I am lagging behind for some years
as I used GRASS 6.3.old there :). Now I am switching to GRASS 6.4 for my
cluster calculations and have to solve this problem.

So indeed I should modify
## use TMPDIR if it exists, otherwise /tmp
#tmp=${TMPDIR-/tmp}
#tmp="$tmp/grass6-$USER-$GIS_LOCK"
tmp=/tmp/grass6-$USER-$GIS_LOCK
(umask 077 && mkdir "$tmp") || {
    echo "Cannot create temporary directory! Exiting." 1>&2
    exit 1
}
GISRC="$tmp/gisrc"
export GISRC

Or no modifications needed at all? Mhh, probably I am really confusing things
and everything just works out of the box. Will test the next days.

If the idea is to manually set up ~/.grassrc6.<hostname> files for
each host, each with a different mapset, then your original idea is
about right, but you need a fallback so that GRASS_BATCH_JOB continues
to work for everyone who only has a ~/.grassrc6 file, not multiple
~/.grassrc6.<hostname>. E.g.:

if [ -n "$GRASS_BATCH_JOB" ] ; then
  GISRCRC="$HOME/.grassrc6.`uname -n`"
  if [ ! -f "$GISRCRC" ] ; then
    GISRCRC="$HOME/.grassrc6"
  fi
else
       GISRCRC="$HOME/.grassrc6"
fi

--
Glynn Clements <glynn@gclements.plus.com>

Glynn:

Actually, I really think that any "advanced" use (and I think that
includes running batch jobs on a cluster) should just bypass Init.sh
altogether and set the environment variables itself.

FWIW "grass64 -c" and GRASS_BATCH_JOB work nicely, the overhead is pretty
small. I am fine with people bypassing init.sh if they like to, but
personally I don't see it as something to spend much time worrying about.

If we had full parser support available for init.sh it might be less wierd
to add "batch=/path/to/script.sh" on the command line instead of asking
the user to set a GRASS_BATCH_JOB env variable and ensure the executable
bit is set on that file.

Contrary to what might be assumed from the insane complexity of
Init.sh, manually configuring the GRASS environment can be done with
as little as:

  export GISBASE=/opt/grass-7.0.svn
  export GRASS_GNUPLOT='gnuplot -persist'
  export GRASS_WIDTH=640
  export GRASS_HEIGHT=480
  export GRASS_HTML_BROWSER=firefox
  export GRASS_PAGER=cat
  export GRASS_WISH=wish
  export GRASS_PYTHON=python
  export GRASS_MESSAGE_FORMAT=silent
  export GRASS_TRUECOLOR=TRUE
  export GRASS_TRANSPARENT=TRUE
  export GRASS_PNG_AUTO_WRITE=TRUE
  
  export PATH="$GISBASE/bin:$GISBASE/scripts:$PATH"
  export LD_LIBRARY_PATH="$GISBASE/lib"
  export GRASS_LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
  export PYTHONPATH="$GISBASE/etc/python:$PYTHONPATH"
  
  export GIS_LOCK=$$
  export GRASS_VERSION="7.0.svn"
  
  tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK
  export GISRC="$tmp/gisrc"
  mkdir "$tmp"
  cp ~/.grassrc6 "$GISRC"

[That is in a script sourced from my ~/.bash_profile; the only time
Init.sh ever gets run here is if I'm testing changes to Init.sh.]

don't forget to run $GISBASE/etc/clean_temp every now and then to clear
out left over files from processes that die or are terminated before they
get to their final cleanup stage. Considerable cruft can accumulate
during testing & debugging of modules.

Hamish

Hamish wrote:

> [That is in a script sourced from my ~/.bash_profile; the only time
> Init.sh ever gets run here is if I'm testing changes to Init.sh.]

don't forget to run $GISBASE/etc/clean_temp every now and then to clear
out left over files from processes that die or are terminated before they
get to their final cleanup stage. Considerable cruft can accumulate
during testing & debugging of modules.

Good point. Perhaps it should be a user-visible module, e.g.
g.cleanup.

One of the potential problems with clean-up was that certain d.*
commands which read from stdin copied the data to a temp file, so that
redraw via D_add_to_list(G_recreate_command()) worked. With monitors
and D_add_to_list() gone in 7.0, that's no longer an issue. AFAICT,
there shouldn't be any need for temp files to persist between commands
now.

--
Glynn Clements <glynn@gclements.plus.com>