[GRASS-dev] superfluous G_legal_filename etc calls

I am running r.sun (6.4) on a massive grid and have some troubles with
the libgis libraries:
I backported the differences locally from 7 (use G_find_cell2()
instead of G_find_cell() but
the problems remain the same:

Note: I additionally locally added this debug output:
Index: lib/gis/find_file.c

--- lib/gis/find_file.c (revision 35659)
+++ lib/gis/find_file.c (working copy)
@@ -48,6 +48,7 @@
     /*
      * reject illegal names and mapsets
      */
+ G_debug(0, "G_legal_filename(%s)", pname);
     if (G_legal_filename(pname) == -1)
        return NULL;

A) Observation one:
G_legal_filename (and underlying functions) are called
a zillion times. IMHO one time per map would be sufficient:

# run:
r.sun -s --o pat_dtm_5m day=297 linkein=linke_turbidity10
numpartitions=64 horizonstep=30 horizon=pat5m_horangle
insol_time=pat_insol_time297 beam_rad=pat_beam_rad297
diff_rad=pat_diff_rad297 refl_rad=pat_refl_rad297
         glob_rad=pat_glob_rad297
Mode 2: integrated daily irradiation
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(MASK)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(MASK)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(MASK)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(MASK)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
D0/0: G_legal_filename(pat5m_horangle_02)
... (so on for minutes)

B) Observation two:
When running the same job (the r.sun call plus a few more lines
are indeed in a shell script) in Grid Engine on a cluster, this
happens:

...
Mode 2: integrated daily irradiation
   0%D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL' <- !!
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL' <- !!
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL'
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
...

Interestingly "Illegal filename" appears (perhaps taking '.' -
current directory - into account?) I wasted quite some time
until I realized that the message apparently doesn't harm. :frowning:

I suspect that r.sun (and maybe other modules) could be faster
if these superfluous calls aren't done.

Any ideas where they come from?
Markus

On Tue, Jan 27, 2009 at 10:45 PM, Markus Neteler <neteler@osgeo.org> wrote:

I am running r.sun (6.4) on a massive grid and have some troubles with
the libgis libraries:

Apparently I have even more (one of many Grid Engine job outputs):

cat launch_SGE_grassjob_rsun_energy.sh.e33397
doy: 160 (month: 06)
Processing day 160...
Mode 2: integrated daily irradiation
0..Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
/home/neteler/binaries/bin/rsun_photoperiod_PAT_utm32.sh: line 46:
7878 Killed r.sun -s --o $DEM day=$d
linkein=linke_turbidity$month numpartitions=64 horizonstep=30
horizon=pat5m_horangle insol_time=pat_insol_time$D
beam_rad=pat_beam_rad$D diff_rad=pat_diff_rad$D
refl_rad=pat_refl_rad$D glob_rad=pat_glob_rad$D
ERROR: Raster map <pat_insol_time160> not found

Is there a possibility to find out why the job was killed?
In /var/log/messages there is no trace (no OOM). I had declared that
each job gets 11GB RAM in Grid Engine. Other jobs continue (same
script, just another day of the year). GE is supposed to send me an email
in case of error but nothing happens.

Mysterious.

Markus

Markus Neteler wrote:

I am running r.sun (6.4) on a massive grid and have some troubles with
the libgis libraries:
I backported the differences locally from 7 (use G_find_cell2()
instead of G_find_cell() but
the problems remain the same:

A) Observation one:
G_legal_filename (and underlying functions) are called
a zillion times. IMHO one time per map would be sufficient:

It's getting called once per element (cellhd, cell, fcell, cats, ...).

Either G_find_file* call G_legal_filename() (in which case, you'll get
one call for each element, multiple calls per map), or they don't (in
which case, you get zero calls per map).

AFAICT, there is no reason for G_find_file* to call
G_legal_filename(), as they are only used for existing files. If the
name isn't legal, it won't exist.

OTOH, the time taken by G_legal_filename() is likely to be trivial
compared to the system calls performed by G_find_file* and any
subsequent open() call.

B) Observation two:
When running the same job (the r.sun call plus a few more lines
are indeed in a shell script) in Grid Engine on a cluster, this
happens:

...
Mode 2: integrated daily irradiation
   0%D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(pat_dtm_5m)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL' <- !!
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL' <- !!
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL'
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
...

Interestingly "Illegal filename" appears (perhaps taking '.' -
current directory - into account?) I wasted quite some time
until I realized that the message apparently doesn't harm. :frowning:

I suspect that r.sun (and maybe other modules) could be faster
if these superfluous calls aren't done.

Any ideas where they come from?

Nope.

Try adding a G_debug() statement to G_legal_filename(), so that you
catch all of the G_legal_filename() calls rather than just those
originating from G_find_file().

--
Glynn Clements <glynn@gclements.plus.com>

Markus Neteler wrote:

Apparently I have even more (one of many Grid Engine job outputs):

cat launch_SGE_grassjob_rsun_energy.sh.e33397
doy: 160 (month: 06)
Processing day 160...
Mode 2: integrated daily irradiation
0..Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'
Illegal filename. Cannot be '.' or 'NULL'

...

Illegal filename. Cannot be '.' or 'NULL'
/home/neteler/binaries/bin/rsun_photoperiod_PAT_utm32.sh: line 46:
7878 Killed r.sun -s --o $DEM day=$d
linkein=linke_turbidity$month numpartitions=64 horizonstep=30
horizon=pat5m_horangle insol_time=pat_insol_time$D
beam_rad=pat_beam_rad$D diff_rad=pat_diff_rad$D
refl_rad=pat_refl_rad$D glob_rad=pat_glob_rad$D
ERROR: Raster map <pat_insol_time160> not found

Is there a possibility to find out why the job was killed?

"Killed" means SIGKILL. This can come from exceeding resource limits,
e.g. stack size (ulimit -s) or CPU time (ulimit -t), but most other
resource limits either can't be exceeded (e.g. RSS, ulimit -m), or
simply result in the relevant system call failing (e.g. data segment
size, ulimit -d).

Exceeding total virtual memory (ulimit -v) can result in SIGKILL if
the limit is exceeded due to stack extension rather than heap
extension (malloc).

And the OOM killer uses SIGKILL.

In /var/log/messages there is no trace (no OOM).

Is that the appropriate log file? On my system, kernel messages never
go to that file:

  *.=info;*.=notice;*.=warn;\
          auth,authpriv.none;\
          cron,daemon.none;\
          mail,news.none;\
          kern.none -/var/log/messages

kern.info and above go to /var/log/kern.log, while kern.debug is
ignored.

But there won't normally be anything in the logs if the process is
killed due to exceeding its own resource limits.

I had declared that
each job gets 11GB RAM in Grid Engine. Other jobs continue (same
script, just another day of the year).

GE is supposed to send me an email in case of error but nothing
happens.

I suspect that "error" would mean that the script itself returned
non-zero or terminated due to a signal.

--
Glynn Clements <glynn@gclements.plus.com>

On Wed, Jan 28, 2009 at 2:47 AM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

I am running r.sun (6.4) on a massive grid and have some troubles with
the libgis libraries:
I backported the differences locally from 7 (use G_find_cell2()
instead of G_find_cell() but
the problems remain the same:

A) Observation one:
G_legal_filename (and underlying functions) are called
a zillion times. IMHO one time per map would be sufficient:

It's getting called once per element (cellhd, cell, fcell, cats, ...).

Either G_find_file* call G_legal_filename() (in which case, you'll get
one call for each element, multiple calls per map), or they don't (in
which case, you get zero calls per map).

AFAICT, there is no reason for G_find_file* to call
G_legal_filename(), as they are only used for existing files. If the
name isn't legal, it won't exist.

OTOH, the time taken by G_legal_filename() is likely to be trivial
compared to the system calls performed by G_find_file* and any
subsequent open() call.

OK, so we ignore it.

B) Observation two:
When running the same job (the r.sun call plus a few more lines
are indeed in a shell script) in Grid Engine on a cluster, this
happens:

...
Mode 2: integrated daily irradiation
   0%D0/0: G_legal_filename(pat_dtm_5m)

...

D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL' <- !!
D0/0: G_legal_filename(linke_turbidity10)
D0/0: G_legal_filename(linke_turbidity10)

...

D0/0: G_legal_filename(pat5m_horangle_00)
D0/0: G_legal_filename(MASK)
Illegal filename. Cannot be '.' or 'NULL'
D0/0: G_legal_filename(pat5m_horangle_01)
D0/0: G_legal_filename(pat5m_horangle_01)
...

Interestingly "Illegal filename" appears (perhaps taking '.' -
current directory - into account?) I wasted quite some time
until I realized that the message apparently doesn't harm. :frowning:

...

Nope.

Try adding a G_debug() statement to G_legal_filename(), so that you
catch all of the G_legal_filename() calls rather than just those
originating from G_find_file().

Good suggestion: My bad.

The launch script contained:

# use day + jobid as MAPSET to avoid GRASS lock
MYMAPSET=$day.$JOB_ID

but day wasn't set. So GRASS created .mapset names which isn't
allowed.

Markus