[GRASS-dev] GRASS init.sh

It seems to me that the GRASS initialization script is a lot more
complicated than necessary. If I understand correctly, the script
needs to do seven things:

1. Set the system environment variables (GISBASE, PATH, LD_LIBRARY_PATH, etc)
2. Set the user environment variables (GISDBASE, LOCATION_NAME, MAPSET, etc)
3. Create a temporary directory for processing
4. Copy .grassrc6 into the temporary directory for session use
5. Launch a Shell with the GRASS environment variables set
6. Copy the session grassrc file from temporary directory back to .grassrc6
7. Delete the temporary directory

Seems straight forward until you take a look at the scripts involved.
A few issues:

1. system-wide environment variables are embedded in two different
scripts (grass61, $GISBASE/etc/init.sh) instead of in a (single)
system-wide configuration file.

2. Many system-wide environment variables are determined at run-time
by init.sh. This means that there is a lot of platform specific code
looking for libraries and executables. Routines are different for each
shell and each major platform. Surely there is a better way to do
this? Maybe a configuration wizard that helps the user write or update
their system-wide config file?

3. user environment variables are stored in two places $HOME/.grassrc6
and $HOME/.grass.$SHELL

4. Again, there is a lot of code in init.sh that tries to discover
these user variables and if necessary query from the user (including a
crude, text-based gui in the script itself).

5. Of course, this won't run at all on Windows without a full Unix
emulation layer

6. It's written in sh, it looks like line noise.

I see all of this mainly as an issue for making the code portable
across platforms and execution environments (such as my IPython
experiment). It would be easier to put configuration options in a
configuration file and then have the various shells pick up the data
using their available tools. I think that this would be much more
portable and much simpler as well.

--
David Finlayson

3. user environment variables are stored in two places $HOME/.grassrc6
and $HOME/.grass.$SHELL

note the difference between shell variables and GIS variables. Each of
these files handles one set.

(e.g. MAPSET is not a shell variable [any more])

http://grass.ibiblio.org/grass61/manuals/html61_user/variables.html
http://grass.ibiblio.org/grass61/manuals/html61_user/g.gisenv.html

GRASS will update .grassrc6 [and the user shouldn't].
A program should never update a user config file (.grass.bashrc).

Hamish

David Finlayson wrote:

It seems to me that the GRASS initialization script is a lot more
complicated than necessary. If I understand correctly, the script
needs to do seven things:

1. Set the system environment variables (GISBASE, PATH, LD_LIBRARY_PATH, etc)
2. Set the user environment variables (GISDBASE, LOCATION_NAME, MAPSET, etc)

To be precise, these aren't environment variables. Within Init.sh,
they are just shell variables (they aren't exported). Init.sh writes
the values to $GISRC, and all GRASS modules obtain the settings from
there.

[Historically, Init.sh used to export them so that they could be
accessed easily from within shell scripts. However, that has the
drawback of making g.mapset (and similar) impossible to implement, as
a command can't change the environment of the shell.]

3. Create a temporary directory for processing
4. Copy .grassrc6 into the temporary directory for session use
5. Launch a Shell with the GRASS environment variables set
6. Copy the session grassrc file from temporary directory back to .grassrc6
7. Delete the temporary directory

Seems straight forward until you take a look at the scripts involved.
A few issues:

1. system-wide environment variables are embedded in two different
scripts (grass61, $GISBASE/etc/init.sh) instead of in a (single)
system-wide configuration file.

The only environment variable set in grass61 is GISBASE. That has to
be set there because the rest of GRASS (including Init.sh) is
referenced relative to $GISBASE.

2. Many system-wide environment variables are determined at run-time
by init.sh. This means that there is a lot of platform specific code
looking for libraries and executables. Routines are different for each
shell and each major platform. Surely there is a better way to do
this? Maybe a configuration wizard that helps the user write or update
their system-wide config file?

Different users may have different values for $SHELL and $PATH.
Init.sh needs to work for all users, not one specific configuration.

3. user environment variables are stored in two places $HOME/.grassrc6
and $HOME/.grass.$SHELL

~/.grassrc6 contains GRASS variables, not environment variables, while
~/.grass.$SHELL contains environment variables.

4. Again, there is a lot of code in init.sh that tries to discover
these user variables and if necessary query from the user (including a
crude, text-based gui in the script itself).

Nothing is set from user input; the "read ans" commands are only used
to wait for a keypress; there are no references to $ans in the script.

5. Of course, this won't run at all on Windows without a full Unix
emulation layer

6. It's written in sh, it looks like line noise.

Those last two are valid points. The question is, what language /will/
work on all common platforms?

Regarding point 5, there is still a /lot/ of stuff in GRASS which
assumes a Unix shell and common tools. Look[1] for calls to system(),
and the commands which are being run. Until there has been substantial
progress in fixing those issues, there isn't much point in worrying
about Init.sh. In the worst case, we can just provide an equivalent
Init.bat script for Windows users.

[1] The obj_imp table in the database created by tools/sql.sh is the
easiest way to locate calls to specific functions.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

[1] The obj_imp table in the database created by tools/sql.sh is the
easiest way to locate calls to specific functions.

It would be nice if that script had a short header comment explaining
what it was for. "Usage: sql.sh <source directory>" doesn't tell me much.

SQL is mostly Greek to me.

thanks,
Hamish

Hamish wrote:

> 3. user environment variables are stored in two places $HOME/.grassrc6
> and $HOME/.grass.$SHELL

note the difference between shell variables and GIS variables.

... as well as the difference between shell variables and environment
variables (the difference being the "export" command).

Each of
these files handles one set.

(e.g. MAPSET is not a shell variable [any more])

It's a shell variable within Init.sh; it isn't an environment variable
any more.

--
Glynn Clements <glynn@gclements.plus.com>

Hamish wrote:

> [1] The obj_imp table in the database created by tools/sql.sh is the
> easiest way to locate calls to specific functions.

It would be nice if that script had a short header comment explaining
what it was for. "Usage: sql.sh <source directory>" doesn't tell me much.

Usage: after having compiled GRASS, run "tools/sql.sh `pwd`" from the
top of the GRASS source tree.

Essentially, the script runs "nm" on every object file, library and
executable it finds (and "ldd" on all of the executables), processes
the output with egrep/sed/awk, then imports the results into a
PostgreSQL database.

"nm" lists the symbol table of an object file, library, or executable,
indicating whether each symbol is imported into, defined in and/or
exported from the file.

Most of the database tables record which symbols are imported into or
exported from which object files, libraries or executables. E.g. the
"obj_imp" table lists which symbols are imported into which object
files.

You can then use simple queries such as:

  grass=> SELECT object FROM obj_imp WHERE symbol = 'I_get_target' ;
                                 object
  --------------------------------------------------------------------
   imagery/i.ortho.photo/photo.2image/OBJ.i686-pc-linux-gnu/target.o
   imagery/i.ortho.photo/photo.2target/OBJ.i686-pc-linux-gnu/target.o
   imagery/i.ortho.photo/photo.elev/OBJ.i686-pc-linux-gnu/main.o
   imagery/i.ortho.photo/photo.rectify/OBJ.i686-pc-linux-gnu/target.o
   imagery/i.ortho.photo/photo.target/OBJ.i686-pc-linux-gnu/main.o
   imagery/i.points/OBJ.i686-pc-linux-gnu/target.o
   imagery/i.rectify/OBJ.i686-pc-linux-gnu/target.o
   imagery/i.vpoints/OBJ.i686-pc-linux-gnu/target.o

to discover which files import a given symbol, or more complex queries
such as:

  grass=> SELECT DISTINCT b.object FROM lib_exp a, obj_imp b
  grass-> WHERE a.library = 'libgrass_form.6.1.cvs.so' AND a.symbol = b.symbol ;
                            object
  -----------------------------------------------------------
   display/d.what.vect/OBJ.i686-pc-linux-gnu/what.o
   vector/v.digit/OBJ.i686-pc-linux-gnu/attr.o
   vector/v.digit/OBJ.i686-pc-linux-gnu/line.o
   vector/v.what/OBJ.i686-pc-linux-gnu/what.o
   visualization/nviz/src/OBJ.i686-pc-linux-gnu/query_vect.o
  (5 rows)

to discover which files import any symbol defined in a specific
library. And so on.

For simple "which files use this function" queries, a database lookup
is quicker and more reliable than grep-ing the source tree.

Assuming that the sql.sh script runs successfully (some of it is
Linux-specific, other bits are PostgreSQL-specific, but the changes
required for a different OS or RDBMS should be quite minor), the
easiest way to figure out what is in a given table (apart from looking
at the name) is to just sample it, e.g.:

  grass=> SELECT * FROM stlib_exp LIMIT 5 ;
        library | object | symbol
  -------------------+------------+---------------
   libgrass_manage.a | add_elem.o | add_element
   libgrass_manage.a | ask.o | ask_in_mapset
   libgrass_manage.a | ask.o | ask_new
   libgrass_manage.a | ask.o | ask_old
   libgrass_manage.a | copyfile.o | copyfile
  (5 rows)

--
Glynn Clements <glynn@gclements.plus.com>

On 6/12/06, Glynn Clements <glynn@gclements.plus.com> wrote:

The only environment variable set in grass61 is GISBASE. That has to
be set there because the rest of GRASS (including Init.sh) is
referenced relative to $GISBASE.

> 2. Many system-wide environment variables are determined at run-time
> by init.sh. This means that there is a lot of platform specific code
> looking for libraries and executables. Routines are different for each
> shell and each major platform. Surely there is a better way to do
> this? Maybe a configuration wizard that helps the user write or update
> their system-wide config file?

Different users may have different values for $SHELL and $PATH.
Init.sh needs to work for all users, not one specific configuration.

I guess my point is this:

It was fine to put configuration information in the init script when
there was effectively only 1 type of shell. Now that we are
considering alternative environments for GRASS such as Java or Python
or R or DOS, etc. it would be nice to get the configuration
information into a proper config file. That way the configuration
(static info about GRASS) is separated from the initialization process
(unique to each environment/shell).

David Finlayson wrote:

> The only environment variable set in grass61 is GISBASE. That has to
> be set there because the rest of GRASS (including Init.sh) is
> referenced relative to $GISBASE.
>
> > 2. Many system-wide environment variables are determined at run-time
> > by init.sh. This means that there is a lot of platform specific code
> > looking for libraries and executables. Routines are different for each
> > shell and each major platform. Surely there is a better way to do
> > this? Maybe a configuration wizard that helps the user write or update
> > their system-wide config file?
>
> Different users may have different values for $SHELL and $PATH.
> Init.sh needs to work for all users, not one specific configuration.

I guess my point is this:

It was fine to put configuration information in the init script when
there was effectively only 1 type of shell. Now that we are
considering alternative environments for GRASS such as Java or Python
or R or DOS, etc. it would be nice to get the configuration
information into a proper config file. That way the configuration
(static info about GRASS) is separated from the initialization process
(unique to each environment/shell).

What type of configuration information are you referring to?

GRASS doesn't have any configuration files. Everything which can be
configured is done so through the environment.

--
Glynn Clements <glynn@gclements.plus.com>

GRASS doesn't have any configuration files. Everything which can be
configured is done so through the environment.

...and set dynamically mostly at run time, that is my point. In the
future (GRASS 7) it would be nice to move the configuration
information currently stored in 34 environment variables (see:
http://grass.itc.it/grass61/manuals/html61_user/variables.html) to a
configuration file. That way new shells could get the info without
needing to recreate the do-it-all initialization process that init.sh
goes through in GRASS 6.X.

I am pretty close to having most of GRASS running entirely in a Python
shell. The only thing Python needs to know are what environment
variables to set up. It does this the same way on Mac, Windows and
Unix, so I don't need all of the platform specific code that is
currently in init.sh and I MUST export it somehow so that Python can
read it. I would like the user to be able to run the Python version or
traditional grass using a common configuration file so that the user
only needs to change it in one place when updating things.

Once I've got grass working from the interpreter (no help from bash),
I can embed the whole interpreter in a GUI framework like wxPython
(such as pyshell for example). And we will be well on our way to a
(optional!) MATLAB-style interface for grass.

On 6/12/06, Glynn Clements <glynn@gclements.plus.com> wrote:

David Finlayson wrote:

> > The only environment variable set in grass61 is GISBASE. That has to
> > be set there because the rest of GRASS (including Init.sh) is
> > referenced relative to $GISBASE.
> >
> > > 2. Many system-wide environment variables are determined at run-time
> > > by init.sh. This means that there is a lot of platform specific code
> > > looking for libraries and executables. Routines are different for each
> > > shell and each major platform. Surely there is a better way to do
> > > this? Maybe a configuration wizard that helps the user write or update
> > > their system-wide config file?
> >
> > Different users may have different values for $SHELL and $PATH.
> > Init.sh needs to work for all users, not one specific configuration.
>
> I guess my point is this:
>
> It was fine to put configuration information in the init script when
> there was effectively only 1 type of shell. Now that we are
> considering alternative environments for GRASS such as Java or Python
> or R or DOS, etc. it would be nice to get the configuration
> information into a proper config file. That way the configuration
> (static info about GRASS) is separated from the initialization process
> (unique to each environment/shell).

What type of configuration information are you referring to?

GRASS doesn't have any configuration files. Everything which can be
configured is done so through the environment.

--
Glynn Clements <glynn@gclements.plus.com>

--
David Finlayson

David Finlayson wrote:

> GRASS doesn't have any configuration files. Everything which can be
> configured is done so through the environment.

...and set dynamically mostly at run time, that is my point.

Many of the ones set by Init.sh are only set if they aren't already
set on entry. IOW, the settings in Init.sh are fall-back settings if
the user hasn't made an explicit choice.

In the
future (GRASS 7) it would be nice to move the configuration
information currently stored in 34 environment variables (see:
http://grass.itc.it/grass61/manuals/html61_user/variables.html) to a
configuration file.

Environment variables have the advantage that each process has a
separate environment, so you can change the settings on a per-process
basis.

Many of the variables listed on the page which you reference are only
intended to be used that way. I.e. they are not meant to be set in the
environment of the session shell, but set locally for individual
commands.

That way new shells could get the info without
needing to recreate the do-it-all initialization process that init.sh
goes through in GRASS 6.X.

Very little of Init.sh is shell-specific; note that the default case
only sets PS1.

Most of the shell-specific stuff amounts to hacks which someone
thought would be a good idea but aren't entirely necessary (e.g.
setting $HOME to the mapset directory then making the shell's startup
script set it back, so that the history gets written to the mapset
directory rather than to the default history file).

I am pretty close to having most of GRASS running entirely in a Python
shell. The only thing Python needs to know are what environment
variables to set up. It does this the same way on Mac, Windows and
Unix, so I don't need all of the platform specific code that is
currently in init.sh and I MUST export it somehow so that Python can
read it. I would like the user to be able to run the Python version or
traditional grass using a common configuration file so that the user
only needs to change it in one place when updating things.

The environment variables which matter are:

PATH
LD_LIBRARY_PATH [*]
GRASS_LD_LIBRARY_PATH
GISBASE
GISRC
GIS_LOCK
GRASS_HTML_BROWSER
GRASS_PAGER
GRASS_PERL
GRASS_TCLSH
GRASS_WISH
GRASS_VERSION
TCLTKGRASSBASE

Obviously, PATH needs to be set relative to $GISBASE so that GRASS
commands and scripts can be executed without having to provide an
absolute path.

LD_LIBRARY_PATH is actually a platform-specific variable; see the
rules for $(ETC)/Init.sh and $(ETC)/grass-run.sh in lib/init/Makefile.

The value for LD_LIBRARY_PATH_VAR is set by the SC_CONFIG_CFLAGS
autoconf macro in aclocal.m4. You will need to get this into your
Python script (unless Python provides a portable interface for this).

GRASS_LD_LIBRARY_PATH has to be set to the same value; this is used by
grass-run.sh to restore the library search path in case it is cleared
by the loader (which happens if you run a setuid/setgid program such
as xterm).

There needs to be a script installed into the system binary directory
(e.g. /usr/local/bin) which does nothing more than set GISBASE to a
specific value then call the main initialisation script from within
$GISBASE. Nothing under $GISBASE has the installation location
hard-coded in, so if you want to install a new version without
removing or disabling the old version, you can just move or rename the
installation directory and modify the grass61 script to match.

GISRC needs to contain the pathname of a writable file which contains
valid settings for GISDBASE, LOCATION_NAME, MAPSET and GRASS_GUI. How
you create this file and where you store it is up to you, but you need
to allow for multiple GRASS sessions, each with a separate state file
(Init.sh puts it in a directory whose name includes the PID).

GIS_LOCK needs to contain the PID of the Init.sh script (at least, it
needs to contain the PID of one of the processes comprising the
session). The FIFO transport for the display drivers create lock files
which contain the PID (so that they know which session created the
lock file).

The various GRASS_* variables are used to specify particular programs,
except for GRASS_VERSION which is (apparently) used by the R
interface.

TCLTKGRASSBASE is probably a misfeature; Tcl code which uses it could
just use $GISBASE/etc instead.

Of the others used in the Init.sh script, GISRCRC shouldn't be
exported, PAGER doesn't need to be mentioned in the script (it will
get passed on to the shell automatically), and GRASS_GNUPLOT is no
longer used (the modules which used it aren't present in 6.x).

--
Glynn Clements <glynn@gclements.plus.com>

GRASS_GNUPLOT is no longer used (the modules which used it aren't
present in 6.x).

actually scripts/i.spectral uses GRASS_GNUPLOT, but this could be
replaced with d.linegraph [+ d.graph]? or just use "gnuplot" without
worrying about GRASS_GNUPLOT?

Hamish