[GRASS-dev] File based GRASS batch processing?

Hi,

I often have to process large data sets on remote
machines which takes more than a day. Since I
cannot keep the ssh connection open for so long,
it would be very convenient to have the possibility
to pass a file (containing GRASS commands) as
parameter to the startup script, say something like:

nohup grass63 -text -batch myjobs.sh ~/grassdata/spearfish60/neteler/ &
                    ^^^^^^^^^^^^^^^^

I assume that this would require changes in lib/init/init.*
(maybe for now only lib/init/init.sh).

Before wasting time on this, any suggestions for a best
implementation? Maybe I am overlooking an easy solution.

Markus

Hello Markus,

Markus Neteler <neteler@itc.it> wrote at Mon, 22 Jan 2007 17:09:16
+0100:

Hi,

I often have to process large data sets on remote
machines which takes more than a day. Since I
cannot keep the ssh connection open for so long,
it would be very convenient to have the possibility
to pass a file (containing GRASS commands) as
parameter to the startup script, say something like:

nohup grass63 -text -batch myjobs.sh ~/grassdata/spearfish60/neteler/
& ^^^^^^^^^^^^^^^^

I assume that this would require changes in lib/init/init.*
(maybe for now only lib/init/init.sh).

Before wasting time on this, any suggestions for a best
implementation? Maybe I am overlooking an easy solution.

Have you tried using screen[1] for this purpose? this can pick up remote
sessions.

I *assume* you could do this using screen as well?!
Check the manpage for details.

Best

  Stephan

[1] http://www.debian-administration.org/articles/34

--
Stephan Holl: www.intevation.de/~stephan| GISpatcher: www.gispatcher.de
Intevation GmbH: www.intevation.de | GAV e.V.: www.grass-verein.de
Georgstr.4: 49074 Osnabrück | Telefon: +49(0)541. 3350832

On Jan 22 [17:09], Markus Neteler wrote:

Before wasting time on this, any suggestions for a best
implementation? Maybe I am overlooking an easy solution.

I use VNC as a workaround: Start vncserver with -localhost option,
Login with ssh -X and start vncviewer. Then I do the grass job on the
vnc desktop, which isn't affected by my logging off.

Is this an option for you?

However, the batch processing mode would be good for at/cron jobs, too.

\f

--
Florian Kindl
Institute of Geography
University of Innsbruck

On Mon, Jan 22, 2007 at 05:24:11PM +0100, Florian Kindl wrote:

On Jan 22 [17:09], Markus Neteler wrote:
>
> Before wasting time on this, any suggestions for a best
> implementation? Maybe I am overlooking an easy solution.
>

I use VNC as a workaround: Start vncserver with -localhost option,
Login with ssh -X and start vncviewer. Then I do the grass job on the
vnc desktop, which isn't affected by my logging off.

Is this an option for you?

No, any VNC connection is eaten by our firewall.

However, the batch processing mode would be good for at/cron jobs, too.

Right. That's why I would prefer a totally decoupled solution.

Markus

On Mon, Jan 22, 2007 at 05:22:09PM +0100, Stephan Holl wrote:

Hello Markus,

Markus Neteler <neteler@itc.it> wrote at Mon, 22 Jan 2007 17:09:16
+0100:

> Hi,
>
> I often have to process large data sets on remote
> machines which takes more than a day. Since I
> cannot keep the ssh connection open for so long,
> it would be very convenient to have the possibility
> to pass a file (containing GRASS commands) as
> parameter to the startup script, say something like:
>
> nohup grass63 -text -batch myjobs.sh ~/grassdata/spearfish60/neteler/
> & ^^^^^^^^^^^^^^^^
>
> I assume that this would require changes in lib/init/init.*
> (maybe for now only lib/init/init.sh).
>
> Before wasting time on this, any suggestions for a best
> implementation? Maybe I am overlooking an easy solution.

Have you tried using screen[1] for this purpose? this can pick up remote
sessions.

I *assume* you could do this using screen as well?!
Check the manpage for details.

Best

  Stephan

[1] http://www.debian-administration.org/articles/34

Hi Stephan,

thanks for the idea. If for any reason file based batch
processing is impossible, I'll try to understand "screen".
So far it looks like overkill for my scope (say: I want
to import a fat vector map which takes one day to build
the topology, so "ps -aef" would be sufficient to see
if it still runs).

anyway, thanks,

Markus

On Jan 22 [17:26], Markus Neteler wrote:

> I use VNC as a workaround: Start vncserver with -localhost option,
> Login with ssh -X and start vncviewer. Then I do the grass job on the
> vnc desktop, which isn't affected by my logging off.
>
> Is this an option for you?

No, any VNC connection is eaten by our firewall.

That's why I tunnel it through ssh. Is that port open?

> However, the batch processing mode would be good for at/cron jobs, too.

Right. That's why I would prefer a totally decoupled solution.

+1
Is certainly preferrable :slight_smile:

\f

--
Florian Kindl
Institute of Geography
University of Innsbruck

On Monday 22 January 2007 08:09, Markus Neteler wrote:

Hi,

I often have to process large data sets on remote
machines which takes more than a day. Since I
cannot keep the ssh connection open for so long,
it would be very convenient to have the possibility
to pass a file (containing GRASS commands) as
parameter to the startup script, say something like:

nohup grass63 -text -batch myjobs.sh ~/grassdata/spearfish60/neteler/ &
                    ^^^^^^^^^^^^^^^^

I assume that this would require changes in lib/init/init.*
(maybe for now only lib/init/init.sh).

Before wasting time on this, any suggestions for a best
implementation? Maybe I am overlooking an easy solution.

Markus

Hi Markus,

It has been mentioned once already by Stephan, but I too would highly
recommend screen. I use this routinely for large GRASS jobs on remote
machines via ssh. The most important commands can be summarized in a couple
sentences:

ssh remote_machine
screen
grass63 location/mapset
v.surf.rst in=bigfile
<ctrl> + <a> (pause and let up on the previous keys) <d>

(now you are back to the shell on the remote machine)
exit

to reconnect:
ssh remote_machine
screen -r

if you have more than one screen session going, use the session number to
reconnect.

Cheers,

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On Mon, Jan 22, 2007 at 05:28:25PM +0100, Markus Neteler wrote:

On Mon, Jan 22, 2007 at 05:22:09PM +0100, Stephan Holl wrote:
> Hello Markus,
>
> Markus Neteler <neteler@itc.it> wrote at Mon, 22 Jan 2007 17:09:16
> +0100:
>
> > Hi,
> >
> > I often have to process large data sets on remote
> > machines which takes more than a day. Since I
> > cannot keep the ssh connection open for so long,
> > it would be very convenient to have the possibility
> > to pass a file (containing GRASS commands) as
> > parameter to the startup script, say something like:
> >
> > nohup grass63 -text -batch myjobs.sh ~/grassdata/spearfish60/neteler/
> > & ^^^^^^^^^^^^^^^^
> >
> > I assume that this would require changes in lib/init/init.*
> > (maybe for now only lib/init/init.sh).
> >
> > Before wasting time on this, any suggestions for a best
> > implementation? Maybe I am overlooking an easy solution.
>
> Have you tried using screen[1] for this purpose? this can pick up remote
> sessions.
>
> I *assume* you could do this using screen as well?!
> Check the manpage for details.
>
> Best
>
> Stephan
>
> [1] http://www.debian-administration.org/articles/34

Hi Stephan,

thanks for the idea. If for any reason file based batch
processing is impossible, I'll try to understand "screen".
So far it looks like overkill for my scope (say: I want
to import a fat vector map which takes one day to build
the topology, so "ps -aef" would be sufficient to see
if it still runs).

I forgot to mention that I cannot directly ssh to that
machine but access it only through another ssh gate, then
with a second ssh I get into it:

     +--- ssh -+ ssh
me --+-> gate -+--> officebox ---> GRASS machine

I guess that "screen" doesn't work for this case.

I was just told that KerGIS has such batch job option,
I'll take a look there (eventually I would like to have
it in GRASS, too).

Markus

Markus Neteler wrote:

I often have to process large data sets on remote
machines which takes more than a day. Since I
cannot keep the ssh connection open for so long,
it would be very convenient to have the possibility
to pass a file (containing GRASS commands) as
parameter to the startup script, say something like:

nohup grass63 -text -batch myjobs.sh ~/grassdata/spearfish60/neteler/ &
                    ^^^^^^^^^^^^^^^^

I assume that this would require changes in lib/init/init.*
(maybe for now only lib/init/init.sh).

Before wasting time on this, any suggestions for a best
implementation? Maybe I am overlooking an easy solution.

As a quick hack, you could try setting SHELL to ./myjobs.sh, so that
Init.sh runs the script instead of an interactive shell. However,
myjobs.sh should probably set SHELL back to e.g. /bin/sh in case
anything which is run from it uses $SHELL.

But adding a -batch switch to Init.sh wouldn't be that hard.

FWIW, I just set up a GRASS environment in my ~/.bash_profile so that
GRASS commands work in every shell:

  export GISBASE=/opt/grass-6.3.cvs
  export GRASS_GNUPLOT='gnuplot -persist'
  export GRASS_WIDTH=640
  export GRASS_HEIGHT=480
  export GRASS_HTML_BROWSER=firefox
  export GRASS_PAGER=cat
  export GRASS_PERL=perl
  export GRASS_TCLSH=tclsh
  export GRASS_WISH=wish
  
  export PATH="$GISBASE/bin:$GISBASE/scripts:$PATH"
  export LD_LIBRARY_PATH="$GISBASE/lib"
  export GRASS_LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
  
  export GIS_LOCK=$$
  export GRASS_VERSION="6.1.cvs"
  
  tmp=/tmp/grass6-"`whoami`"-$GIS_LOCK
  export GISRC="$tmp/gisrc"
  mkdir "$tmp"
  cp ~/.grassrc6 "$GISRC"

Also, my ~/.xsession script sources ~/.bash_profile, so all X
applications are part of a GRASS session, so I can run GRASS commands
using M-! in XEmacs.

The only time I actually start a separate GRASS session is via
bin.<arch>/grass63, when I want to run a just-compiled version without
installing it.

[The only downside is that I have to remember to manually delete all
of the /tmp/grass-glynn-* and /opt/grass-data/*/*/.tmp/* directories
occasionally.]

--
Glynn Clements <glynn@gclements.plus.com>

On Mon, Jan 22, 2007 at 10:34:59PM +0000, Glynn Clements wrote:

Markus Neteler wrote:

> I often have to process large data sets on remote
> machines which takes more than a day. Since I

...

As a quick hack, you could try setting SHELL to ./myjobs.sh, so that
Init.sh runs the script instead of an interactive shell. However,
myjobs.sh should probably set SHELL back to e.g. /bin/sh in case
anything which is run from it uses $SHELL.

This seems to work.

But adding a -batch switch to Init.sh wouldn't be that hard.

Attached a first sort of working hack "diff". Could be
certainly nicer.

[.. useful hints omitted.. ]

It works like this:

export GRASS_BATCH_JOB=/tmp/myjob.sh
grass63 -text ~/grassdata/spearfish60/neteler/

Markus

(attachments)

diff (1.38 KB)

On Tue, Jan 23, 2007 at 12:10:20AM +0100, Markus Neteler wrote:

Attached a first sort of working hack "diff". Could be
certainly nicer.

Sorry, this one I wanted to send.

Markus

(attachments)

diff2 (1.22 KB)

Markus Neteler wrote:

I forgot to mention that I cannot directly ssh to that
machine but access it only through another ssh gate, then
with a second ssh I get into it:

     +--- ssh -+ ssh
me --+-> gate -+--> officebox ---> GRASS machine

I guess that "screen" doesn't work for this case.

It shouldn't make any difference how you get into the system. Once you
start screen (on the "GRASS machine"), you should be able to reconnect
to the session whenever (and however) you subsequently get in (ssh,
telnet, dial-up login, console login, etc).

--
Glynn Clements <glynn@gclements.plus.com>

Hi,

2007/1/23, Glynn Clements <glynn@gclements.plus.com>:

Markus Neteler wrote:

> I forgot to mention that I cannot directly ssh to that
> machine but access it only through another ssh gate, then
> with a second ssh I get into it:
>
> +--- ssh -+ ssh
> me --+-> gate -+--> officebox ---> GRASS machine
>
> I guess that "screen" doesn't work for this case.

It shouldn't make any difference how you get into the system. Once you
start screen (on the "GRASS machine"), you should be able to reconnect
to the session whenever (and however) you subsequently get in (ssh,
telnet, dial-up login, console login, etc).

And best thing - if You get disconnected by accident (i.e. timeout or
cell phone gets out of range), You can reconnect later and continue to
work where You left it. Screen also allows to have multiple windows
per screen session - it's like having multiple consoles open in single
work session.
Gentoo users should know screen very well as it's the best tool how to
compile software on remote machines or how to leave compilation while
logging out from X.

If You do NOT need X11 to run GRASS, screen is right tool to go for
long run or remote sessions.

Maris.

Markus Neteler wrote on 01/23/2007 12:13 AM:

On Tue, Jan 23, 2007 at 12:10:20AM +0100, Markus Neteler wrote:
  

Attached a first sort of working hack "diff". Could be
certainly nicer.
    
Sorry, this one I wanted to send.
  

I have submitted an improved version to CVS.

Usage:
Just define the environmental variable GRASS_BATCH_JOB with
the shell script file containing GRASS (or whatever) commands, preferably
with full patch. Then launch GRASS and it will be executed. Best is
to launch GRASS in -text mode and to provide GISDBASE/location/mapset
as parameters. The job scripts needs executable file permissions (chmod).

Example:

chmod u+x $HOME/my_grassjob.sh
export GRASS_BATCH_JOB=$HOME/my_grassjob.sh
grass63 ~/grassdata/spearfish60/neteler/

The grass63 command starts GRASS, executes the contents of the job file
and leaves GRASS. Since the normal startup/closure is used, all tmp files
are properly removed.

To deactivate the batch job mode, run (bash example):
unset GRASS_BATCH_JOB

Markus

PS: Next I learn "screen" :slight_smile:

suggested improvements attached:

+ force text mode (don't launch gis.m).
  (probably it should exit with error if MAPSET is not given on command line
   & GRASS_BATCH_JOB is defined, too. so avoid "$ETC/set_data". Or create new
   GRASS_GUI value "none", set if BATCH is used?)

+ replace final "exit 0" with "exit $EXIT_VAL" so the whole grass-batch
command exits with the return value of the batch script, not hardcoded
value "0".

bug:

- your hack fails in Cygwin due to init.sh "if [ "$CYGWIN" ] ; then
  ... export SHELL=" hack earlier in the script

simplify $GRASS_BATCH_JOB to $GRASS_BATCH?
JOB implies ^Z, fg, bg, PID, ...

Hamish

(attachments)

initsh_batch_ret.diff (837 Bytes)

Hi,
as Gentoo user, I'm also big screen fan :wink:
More - inline.

2007/1/22, Dylan Beaudette <debeaudette@ucdavis.edu>:

Hi Markus,

It has been mentioned once already by Stephan, but I too would highly
recommend screen. I use this routinely for large GRASS jobs on remote
machines via ssh. The most important commands can be summarized in a couple
sentences:

ssh remote_machine
screen
grass63 location/mapset
v.surf.rst in=bigfile
<ctrl> + <a> (pause and let up on the previous keys) <d>

(now you are back to the shell on the remote machine)
exit

to reconnect:
ssh remote_machine
screen -r

if you have more than one screen session going, use the session number to
reconnect.

I use one screen process with multiple windows. Inside screen process press:
<ctrl>+c - create new screen window
<ctrl>+n - switch to next screen window (if You have more than one
screen window - just keep pressing this key sequence till You get
right screen window)

This will allow during single ssh session to use GRASS, manage server
and use some console based IRC client to hang around in grass :slight_smile:

If You have been diconnected from remote machine while screen was
open, use "screen -d -r" to resume - it will detach previous screen
session and then resume it - You can continue work as nothing
happened.

Hope somebody will find this helpfull,
Maris.