[GRASS5] Portability issues

I've been looking into portability issues, including an analysis of
direct calls to non-ANSI functions from GRASS programs and libraries.
For programs, the statistics are:

symbol |count
------------+-----
system | 134
unlink | 97
access | 77
close | 71
sleep | 57
read | 54
pclose | 50
popen | 50
isatty | 48
open | 46
write | 40
fork | 35
wait | 34
snprintf | 30
_exit | 29
dup | 26
pipe | 26
creat | 25
lseek | 24
execl | 23
execlp | 19
fileno | 19
opendir | 18
closedir | 17
fdopen | 16
getpid | 16
readdir | 16
execvp | 14
kill | 14
stat | 10
umask | 8
mkdir | 3
alarm | 2
chdir | 2
drand48 | 2
getpwuid | 2
gettimeofday| 2
getuid | 2
ioctl | 2
lrand48 | 2
srand48 | 2
strcasecmp | 2
strdup | 2
swab | 2
chmod | 1
getopt | 1
index | 1
putenv | 1
select | 1
sigaction | 1
sigemptyset | 1
strncasecmp | 1
sync | 1
truncate | 1
ttyname | 1
tzset | 1
usleep | 1
waitpid | 1

Some of these could simply be replaced with ANSI functions, while
others suggest that new functions should be added to e.g. libgis to
improve portability.

Some comments on specific functions:

+ open close creat read write lseek truncate

Many of these could probably be replaced with the ANSI stdio
equivalents.

+ mkdir chdir opendir closedir readdir

The ANSI libraries don't deal with directories. However, any system on
which GRASS runs will have equivalent functionality; we just need to
provide a portable interface.

+ drand48 lrand48 srand48

These can be replaced with rand/srand. Presently, all programs which
use (s.random, r.random, r.mapcalc) them can fall back to rand/srand,
but [rs].random attempt to guess whether the *rand48 functions are
available based upon platform macros rather than HAVE_DRAND48.

+ strcasecmp strncasecmp strdup swab index

Simple string processing functions which could easily be replaced with
generic versions. Actually, libgis already provides G_store and
G_strcasecmp, although the latter is a hand-coded implementation which
only works for ASCII characters (I don't know whether this is
intentional; there are valid arguments both for and against honouring
the locale settings).

+ snprintf

C9X defines this, so in a couple of decades it won't be a problem. For
now, a wide variety of solutions are possible, all with their own
advantages and disadvantages.

+ unlink

This can just be changed to remove(), which is ANSI.

+ sleep usleep

Suitable functionality should be available on all platforms; we just
need a portable interface.

+ sigaction sigemptyset

Only used by r.mapcalc. signal() can be used instead, although
signal() has problems of its own (BSD-vs-SysV signal semantics,
general lack of flexibility).

+ access stat umask chmod

Closely related to the Unix permission model, although many of the
callers of access() and stat() only use information for which portable
interfaces could be provided (e.g. whether a file exists, or its
size).

+ dup pipe select fileno fdopen

Specific to the core Unix I/O API. These would need to be analysed on
a case-by-case basis. Although, a significant number of these calls
are from db.*, *.db or p.* programs, which suggests that a lot of it
may be localised in libdbmi and paint/Interface/applib (these are
static libraries, so their dependencies become their clients'
dependencies as far as "nm" is concerned).

+ isatty ioctl ttyname

Unix terminal I/O.

The ioctl() calls are all terminal-related, and not widely used.

ttyname() is only used by mon.start, and is probably no longer
relevant. The Tek4105 driver isn't present in GRASS5, and even if it
was resurrected, it's unlikely to be used on non-Unix systems.

That just leaves isatty(); Cygwin manages to implement this, so there
must be some way to determine if input is being read from a console.

+ fork execl execlp execvp getpid kill wait waitpid

Unix process management. Providing a portable interface for spawning
processes would be quite involved, but also quite useful, particularly
in conjunction with the next point.

+ system popen pclose

These suffer from the same issues as the previous point, with the
additional complication that the command is passed to the shell. I.e.
whichever shell happens to be /bin/sh on the system in question (the
original AT&T Bourne shell, ash, bash v1, bashv2 and zsh are all
plausible).

Many of the problems with spaces in filenames can be attributed to the
use of these functions. While using single quotes should solve those, it
won't prevent a web interface from being abused to execute commands on
the server.

+ _exit

Most of the programs which use this are the same programs which use
pipe() and dup(), which points to libdbmi and paint/Interface/applib.
I don't know whether it's really necessary to use _exit() rather than
exit().

+ alarm

Used by i.class and v.digit, presumably to implement a timeout. In the
worst case, we could just provide a stub function which does nothing
(i.e. no timeout).

+ getopt

Used by s.sweep; could use G_parser(), or could just parse argv
manually.

+ sync

Used by v.apply.census; almost certainly gratuituous.

+ tzset

Used by r.spread; gratuitous.

+ gettimeofday

Used by XDRIVER and NVIZ; equivalent functionality is likely to exist
elsewhere.

+ getuid

Used by g.help and clean_temp. g.help only uses it (in conjunction
with getpwuid(); see below) to determine the user's home directory; it
should use getenv("HOME") instead.

+ getpwuid

Used by g.help and set_data. set_data uses it to determine the
username of the owner of the GISDBASE directory for printing a
diagnostic message; removing it wouldn't be a great loss.

+ putenv

Used by XDRIVER; it probably doesn't need to be portable.

In addition to the functions which are called directly from programs,
the following non-ANSI functions are called from the libraries:

+ setpriority setreuid setuid geteuid

Used by src/libes/gis/set_prior.c. Brief examination of the code
suggests that this is a hack which could be readily replaced by stubs.

geteuid() is also used by G__mapset_permissions(), which is mostly
ill-conceived anyhow.

+ socket bind listen accept connect

Specific to unix_socks.c, which should only be used by the monitor
interface. While Win32 provides the above functions, the Unix
functions return a file descriptor which can be used with other Unix
API functions, while Win32' SOCKET type is specific to the WinSock
API.

+ cuserid

Used by libdbmi; I'm not sure whether there's a reason it uses
cuserid() instead of getlogin().

+ gethostname

Used by G__machine_name(), which is sensible enough. Presumably
equivalent functionality is available on any networked system (and, on
non-networked systems, you don't really need a per-machine
identifier).

+ getlogin

Used by G_done_msg().

+ link

Used by close_new (G_{close,unopen}_cell) to rename the temporary
file; it should probably use rename() instead.

+ rewinddir

Used by libdbmi; should be dealt with in conjunction with
opendir/closedir/readdir above.

+ setpgrp

Used by G_fork(), and the main() function from the driver library;
should be dealt with in conjunction with other process spawning
functions (fork, execl etc).

+ tempnam

Used by the gmath library, but not directly. It appears to come from
libg2c (the gcc F77-C interface), so not an issue.

--
Glynn Clements <glynn.clements@virgin.net>

On Thu, Oct 03, 2002 at 04:51:37PM +0100, Glynn Clements wrote:
[snip]

Some of these could simply be replaced with ANSI functions, while
others suggest that new functions should be added to e.g. libgis to
improve portability.

Some comments on specific functions:

+ open close creat read write lseek truncate

Many of these could probably be replaced with the ANSI stdio
equivalents.

libgis uses file descriptors heavily, so POSIX functions are
useful/mandatory. ANSI doesn't have a truncate() or ftruncate().

+ mkdir chdir opendir closedir readdir

The ANSI libraries don't deal with directories. However, any system on
which GRASS runs will have equivalent functionality; we just need to
provide a portable interface.

Not a bad idea to have system interface (though Winders supposedly is
supporting POSIX these days...).

+ drand48 lrand48 srand48

These can be replaced with rand/srand. Presently, all programs which
use (s.random, r.random, r.mapcalc) them can fall back to rand/srand,
but [rs].random attempt to guess whether the *rand48 functions are
available based upon platform macros rather than HAVE_DRAND48.

Might be good to include a couple random number generators in libgis.
Sources for a number of good ones are freely available..

+ snprintf

C9X defines this, so in a couple of decades it won't be a problem. For
now, a wide variety of solutions are possible, all with their own
advantages and disadvantages.

Couple of *decades*! ;^) I think mostly we'd rather have an asprintf()
available...

+ sigaction sigemptyset

Only used by r.mapcalc. signal() can be used instead, although
signal() has problems of its own (BSD-vs-SysV signal semantics,
general lack of flexibility).

sigaction is POSIX, so should be fairly portable by now. ANSI only
minimally describes signals, hence the BSD vs. SysV, etc...

[snip]

+ putenv

Used by XDRIVER; it probably doesn't need to be portable.

Why not setenv()?

[snip]

+ gethostname

Used by G__machine_name(), which is sensible enough. Presumably
equivalent functionality is available on any networked system (and, on
non-networked systems, you don't really need a per-machine
identifier).

That's mostly for the tmpfile thing these days, no? Probably could
think about getting rid of it if the whole mapset permissions thing
is redone.. (Hmm, there's also that email interface...).

--
begin 664 .signature
M<F5L;&E-("Y'(&-I<D4@/G1E;BYS<&I`,FUG93P)"`@("`@("`@("`@("`@(
M"`@("`@("`@("`@("`@("`A%<FEC($<N($UI;&QE<B`\96=M,D!J<',N;F5T
"/@H`
`
end

Eric G. Miller wrote:

> Some of these could simply be replaced with ANSI functions, while
> others suggest that new functions should be added to e.g. libgis to
> improve portability.
>
> Some comments on specific functions:
>
> + open close creat read write lseek truncate
>
> Many of these could probably be replaced with the ANSI stdio
> equivalents.

libgis uses file descriptors heavily, so POSIX functions are
useful/mandatory. ANSI doesn't have a truncate() or ftruncate().

I'm not really concerned about libgis. Implementing multiple versions
of core functions is feasible; doing the same for all the individual
modules isn't.

truncate() is only used by d.labels, and its use there appears to be
unnecessary.

Basically, I'm suggesting that modules which simply need to read/write
files should prefer the ANSI stdio mechansims over the POSIX ones.

> + mkdir chdir opendir closedir readdir
>
> The ANSI libraries don't deal with directories. However, any system on
> which GRASS runs will have equivalent functionality; we just need to
> provide a portable interface.

Not a bad idea to have system interface (though Winders supposedly is
supporting POSIX these days...).

NT has a POSIX subsystem, but it's "bare" POSIX.1 (e.g. no sockets),
and it's completely detached from the rest of Windows (i.e. a POSIX
program can't use the Win32 API).

> + snprintf
>
> C9X defines this, so in a couple of decades it won't be a problem. For
> now, a wide variety of solutions are possible, all with their own
> advantages and disadvantages.

Couple of *decades*! ;^) I think mostly we'd rather have an asprintf()
available...

Yeah, but asprintf() isn't standard either, and requires non-trivial
code changes.

This simplest workaround is an snprintf() look-alike which simply
ignores the length option, and passes the rest to vsprintf(). In most
of the cases which I examined, the caller doesn't actually check
whether the buffer was too short, so you would probably still get
erroneous behaviour.

> + sigaction sigemptyset
>
> Only used by r.mapcalc. signal() can be used instead, although
> signal() has problems of its own (BSD-vs-SysV signal semantics,
> general lack of flexibility).

sigaction is POSIX, so should be fairly portable by now. ANSI only
minimally describes signals, hence the BSD vs. SysV, etc...

r.mapcalc only actually uses it to set a flag if SIGFPE occurs, so
signal() would probably suffice here.

> + putenv
>
> Used by XDRIVER; it probably doesn't need to be portable.

Why not setenv()?

setenv() is BSD 4.3. putenv() is POSIX, BSD 4.3, SVID 3. Neither are
ANSI. The C99 description of getenv() says:

  The set of environment names and the method for altering the
  environment list are implementation-defined.

> + gethostname
>
> Used by G__machine_name(), which is sensible enough. Presumably
> equivalent functionality is available on any networked system (and, on
> non-networked systems, you don't really need a per-machine
> identifier).

That's mostly for the tmpfile thing these days, no? Probably could
think about getting rid of it if the whole mapset permissions thing
is redone.. (Hmm, there's also that email interface...).

The only caller of G__machine_name() is G__temp_element(), which is
presumably to handle the case where GISDBASE is on a network share.

--
Glynn Clements <glynn.clements@virgin.net>