[GRASS-dev] r.series map names buffer overflow

Hi,

we try to calculate the average of 1460 maps (4 observations per day) from MODIS
but r.series seems to overflow somewhere:

YEAR=2002
LIST=`g.mlist type=rast pat="*_lst1km${YEAR}*"
mapset="modisLSTinterpolationPAT" sep=","`
r.series -n $LIST out=lst_${YEAR}_avg method=average --o
...
Reading raster map <terra_lst1km20021215.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021216.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021216.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021217.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021217.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021218.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021218.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021219.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021219.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021220.LST_Day_1km.rst>...
WARNING: Unable to open
         '/hardmnt/eden0/castellani/grassdata/patGB1/modisLSTinterpolationPAT/cell_misc/terra_lst1km20021220.LST_Day_1km.rst/f_format'
WARNING: quantization file [terra_lst1km20021220.LST_Day_1km.rst] in mapset
         [modisLSTinterpolationPAT] missing
Reading raster map <terra_lst1km20021220.LST_Night_1km.rst>...
WARNING: Unable to open raster map
         <terra_lst1km20021220.LST_Night_1km.rst@modisLSTinterpolationPAT>
ERROR: Unable to open raster map <terra_lst1km20021220.LST_Night_1km.rst>
       in mapset <modisLSTinterpolationPAT>

The map is there and ok.

I am unsure how to track this down (maybe there is a fixed buffer in libgis?).

Markus

Markus Neteler wrote:

we try to calculate the average of 1460 maps (4 observations per day) from MODIS
but r.series seems to overflow somewhere:

YEAR=2002
LIST=`g.mlist type=rast pat="*_lst1km${YEAR}*"
mapset="modisLSTinterpolationPAT" sep=","`
r.series -n $LIST out=lst_${YEAR}_avg method=average --o
...
Reading raster map <terra_lst1km20021215.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021216.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021216.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021217.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021217.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021218.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021218.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021219.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021219.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021220.LST_Day_1km.rst>...
WARNING: Unable to open
         '/hardmnt/eden0/castellani/grassdata/patGB1/modisLSTinterpolationPAT/cell_misc/terra_lst1km20021220.LST_Day_1km.rst/f_format'
WARNING: quantization file [terra_lst1km20021220.LST_Day_1km.rst] in mapset
         [modisLSTinterpolationPAT] missing
Reading raster map <terra_lst1km20021220.LST_Night_1km.rst>...
WARNING: Unable to open raster map
         <terra_lst1km20021220.LST_Night_1km.rst@modisLSTinterpolationPAT>
ERROR: Unable to open raster map <terra_lst1km20021220.LST_Night_1km.rst>
       in mapset <modisLSTinterpolationPAT>

The map is there and ok.

I am unsure how to track this down (maybe there is a fixed buffer in libgis?).

What does "ulimit -n" say? That's the OS-imposed limit on the number
of open file descriptors per process.

On my system, both the hard and soft limits are 1024. The soft limit
can be changed with e.g. "ulimit -n 1500", but only up to the hard
limit. The hard limit can only be changed by root.

Limits are typically set at login by the pam_limits.so module, which
is configured via /etc/security/limits.conf. Limits are inherited by
child processes; any changes to limits.conf won't take effect until
you to log in again.

If you have root access via sudo, you can spawn a root shell, increase
the hard limit, then spawn a child shell under your normal account
which will inherit the increased limit. E.g.:

  $ sudo bash
  # ulimit -Hn 1500
  # sudo -u markus bash
  $ ulimit -n 1500
  $

--
Glynn Clements <glynn@gclements.plus.com>

On Fri, Nov 14, 2008 at 1:25 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

we try to calculate the average of 1460 maps (4 observations per day) from MODIS
but r.series seems to overflow somewhere:

YEAR=2002
LIST=`g.mlist type=rast pat="*_lst1km${YEAR}*"
mapset="modisLSTinterpolationPAT" sep=","`
r.series -n $LIST out=lst_${YEAR}_avg method=average --o
...
Reading raster map <terra_lst1km20021215.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021216.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021216.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021217.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021217.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021218.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021218.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021219.LST_Day_1km.rst>...
Reading raster map <terra_lst1km20021219.LST_Night_1km.rst>...
Reading raster map <terra_lst1km20021220.LST_Day_1km.rst>...
WARNING: Unable to open
         '/hardmnt/eden0/castellani/grassdata/patGB1/modisLSTinterpolationPAT/cell_misc/terra_lst1km20021220.LST_Day_1km.rst/f_format'
WARNING: quantization file [terra_lst1km20021220.LST_Day_1km.rst] in mapset
         [modisLSTinterpolationPAT] missing
Reading raster map <terra_lst1km20021220.LST_Night_1km.rst>...
WARNING: Unable to open raster map
         <terra_lst1km20021220.LST_Night_1km.rst@modisLSTinterpolationPAT>
ERROR: Unable to open raster map <terra_lst1km20021220.LST_Night_1km.rst>
       in mapset <modisLSTinterpolationPAT>

The map is there and ok.

I am unsure how to track this down (maybe there is a fixed buffer in libgis?).

What does "ulimit -n" say? That's the OS-imposed limit on the number
of open file descriptors per process.

Bingo:
ulimit -n
1024

On my system, both the hard and soft limits are 1024. The soft limit
can be changed with e.g. "ulimit -n 1500", but only up to the hard
limit. The hard limit can only be changed by root.

I forgot about this limitation.
This is somewhat dangerous, say, could it be trapped? If r.series
gets more input files than ulimit -n (C equivalent) allows, could
it spit out an error (the manual suggesting than to split into smaller
jobs)?

Limits are typically set at login by the pam_limits.so module, which
is configured via /etc/security/limits.conf. Limits are inherited by
child processes; any changes to limits.conf won't take effect until
you to log in again.

If you have root access via sudo, you can spawn a root shell, increase
the hard limit, then spawn a child shell under your normal account
which will inherit the increased limit. E.g.:

       $ sudo bash
       # ulimit -Hn 1500
       # sudo -u markus bash
       $ ulimit -n 1500
       $

Thanks for your detailed explanations. In this case, this would do the
trick.

Markus

Markus Neteler wrote:

>> I am unsure how to track this down (maybe there is a fixed buffer in libgis?).
>
> What does "ulimit -n" say? That's the OS-imposed limit on the number
> of open file descriptors per process.

Bingo:
ulimit -n
1024

> On my system, both the hard and soft limits are 1024. The soft limit
> can be changed with e.g. "ulimit -n 1500", but only up to the hard
> limit. The hard limit can only be changed by root.

I forgot about this limitation.
This is somewhat dangerous, say, could it be trapped? If r.series
gets more input files than ulimit -n (C equivalent) allows, could
it spit out an error (the manual suggesting than to split into smaller
jobs)?

It's possible to detect that this has occurred, but only in the lowest
levels of libgis, i.e. in G__open(). open() should return EMFILE if it
fails due to exceeding the per-process resource limit (or ENFILE for
the system-wide limit, but that's rather unlikely).

It isn't feasible to accurately predict that it will occur before the
fact. Apart from the descriptors for the [f]cell files, which are held
open throughout the process, other descriptors will already be open on
entry (at least stdin, stdout and stderr will be open, and often a few
others inherited from the caller), and additional descriptors will be
opened temporarily throughout the life of the process (but it's hard
to know how many, e.g. some libc functions will read configuration or
data files upon first use).

OTOH, it would be straightforward to print a warning if the number of
maps exceeds e.g. limit * 0.95:

  #ifndef __MINGW32__
  #include <sys/resource.h>

  struct rlimit lim;

  if (getrlimit(RLIMIT_NOFILE, &rlim) < 0)
      G_warning("unable to determine resource limit (shouldn't happen)");
  else if (nmaps > rlim.rlim_max * 0.95)
      G_warning("may exceed hard limit on number of files; consult your sysadmin in event of errors");
  else if (nmaps > rlim.rlim_cur * 0.95)
      /* ulimit is a Bourne-shell command; csh uses `limit' and `unlimit' */
      G_warning("may exceed soft limit on number of files; use `ulimit -n' in event of errors");

  #endif /* __MINGW32__ */

BTW, now that the G__.fileinfo array is allocated dynamically, I have
been thinking about making libgis keep open the descriptor for the
null bitmap. Re-opening the file every few rows can have a significant
performance impact for modules which are I/O-bound.

However, this would mean that you need twice as many descriptors (or
will hit the limit with half the number of maps). AFAICT, this was
(part of) the original reason for not keeping the null bitmap open.

But that was when Linux had a system-wide limit of (by default) 1024
open files, set at compile time. Nowadays, typical defaults are 1024
files per process and ~200k files system-wide, both of which can be
changed at run time (with ulimit -n and /proc/sys/fs/file-max
respectively).

Ultimately, if you want to be able to use r.series (or other modules
which process several maps concurrently) with large numbers of maps,
you (or your sysadmin) need to ensure that resource limits are set
accordingly.

--
Glynn Clements <glynn@gclements.plus.com>

On Fri, Nov 14, 2008 at 5:44 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Markus Neteler wrote:

>> I am unsure how to track this down (maybe there is a fixed buffer in libgis?).
>
> What does "ulimit -n" say? That's the OS-imposed limit on the number
> of open file descriptors per process.

Bingo:
ulimit -n
1024

> On my system, both the hard and soft limits are 1024. The soft limit
> can be changed with e.g. "ulimit -n 1500", but only up to the hard
> limit. The hard limit can only be changed by root.

ulimit -n 1500
bash: ulimit: open files: cannot modify limit: Operation not permitted

On my box I can only *reduce* the limit as normal user (1023 works).

I forgot about this limitation.
This is somewhat dangerous, say, could it be trapped? If r.series
gets more input files than ulimit -n (C equivalent) allows, could
it spit out an error (the manual suggesting than to split into smaller
jobs)?

It's possible to detect that this has occurred, but only in the lowest
levels of libgis, i.e. in G__open(). open() should return EMFILE if it
fails due to exceeding the per-process resource limit (or ENFILE for
the system-wide limit, but that's rather unlikely).

Is just counting the number of input maps given to the parser a no-op?
If the user gives more than rlim.rlim_max * 0.95 input files then bail out.

It isn't feasible to accurately predict that it will occur before the
fact. Apart from the descriptors for the [f]cell files, which are held
open throughout the process, other descriptors will already be open on
entry (at least stdin, stdout and stderr will be open, and often a few
others inherited from the caller), and additional descriptors will be
opened temporarily throughout the life of the process (but it's hard
to know how many, e.g. some libc functions will read configuration or
data files upon first use).

I see.

OTOH, it would be straightforward to print a warning if the number of
maps exceeds e.g. limit * 0.95:

       #ifndef __MINGW32__
       #include <sys/resource.h>

       struct rlimit lim;

       if (getrlimit(RLIMIT_NOFILE, &rlim) < 0)
           G_warning("unable to determine resource limit (shouldn't happen)");
       else if (nmaps > rlim.rlim_max * 0.95)
           G_warning("may exceed hard limit on number of files; consult your sysadmin in event of errors");
       else if (nmaps > rlim.rlim_cur * 0.95)
           /* ulimit is a Bourne-shell command; csh uses `limit' and `unlimit' */
           G_warning("may exceed soft limit on number of files; use `ulimit -n' in event of errors");

       #endif /* __MINGW32__ */

BTW, now that the G__.fileinfo array is allocated dynamically, I have
been thinking about making libgis keep open the descriptor for the
null bitmap. Re-opening the file every few rows can have a significant
performance impact for modules which are I/O-bound.

However, this would mean that you need twice as many descriptors (or
will hit the limit with half the number of maps). AFAICT, this was
(part of) the original reason for not keeping the null bitmap open.

This would definitely be a showstopper for me since I regularly work
with (multi year) time series.

But that was when Linux had a system-wide limit of (by default) 1024
open files, set at compile time. Nowadays, typical defaults are 1024
files per process and ~200k files system-wide, both of which can be
changed at run time (with ulimit -n and /proc/sys/fs/file-max
respectively).

A pity that it is apparently yet set to the historic level.
I have
cat /proc/sys/fs/file-max
306995

Ultimately, if you want to be able to use r.series (or other modules
which process several maps concurrently) with large numbers of maps,
you (or your sysadmin) need to ensure that resource limits are set
accordingly.

Right, thanks for the pointers.

Markus

Markus Neteler wrote:

>> >> I am unsure how to track this down (maybe there is a fixed buffer in libgis?).
>> >
>> > What does "ulimit -n" say? That's the OS-imposed limit on the number
>> > of open file descriptors per process.
>>
>> Bingo:
>> ulimit -n
>> 1024
>>
>>
>> > On my system, both the hard and soft limits are 1024. The soft limit
>> > can be changed with e.g. "ulimit -n 1500", but only up to the hard
>> > limit. The hard limit can only be changed by root.

ulimit -n 1500
bash: ulimit: open files: cannot modify limit: Operation not permitted

On my box I can only *reduce* the limit as normal user (1023 works).

Yep. You can't increase the soft limit above the hard limit, and you
can't increase the hard limit (-H flag) unless you're root (or have
the CAP_SYS_RESOURCE capability on systems with capabilities).

The soft limit protects against a runaway process; the hard limit
protects against a user willfully hogging resources.

>> I forgot about this limitation.
>> This is somewhat dangerous, say, could it be trapped? If r.series
>> gets more input files than ulimit -n (C equivalent) allows, could
>> it spit out an error (the manual suggesting than to split into smaller
>> jobs)?
>
> It's possible to detect that this has occurred, but only in the lowest
> levels of libgis, i.e. in G__open(). open() should return EMFILE if it
> fails due to exceeding the per-process resource limit (or ENFILE for
> the system-wide limit, but that's rather unlikely).

Is just counting the number of input maps given to the parser a no-op?
If the user gives more than rlim.rlim_max * 0.95 input files then bail out.

A fatal error is possibly overkill. That will happen anyhow if the
process actually exceeds the limit.

The only reason I can see for making it a fatal error is if the peak
usage occurs at the end. In that situation, it might perform all of
the work then fail when it starts closing the maps. But that seems
unlikely; particularly if the loop which closes the input maps is
moved before the output maps are closed and their history written.

> BTW, now that the G__.fileinfo array is allocated dynamically, I have
> been thinking about making libgis keep open the descriptor for the
> null bitmap. Re-opening the file every few rows can have a significant
> performance impact for modules which are I/O-bound.
>
> However, this would mean that you need twice as many descriptors (or
> will hit the limit with half the number of maps). AFAICT, this was
> (part of) the original reason for not keeping the null bitmap open.

This would definitely be a showstopper for me since I regularly work
with (multi year) time series.

Well, it isn't a problem if you have root access (or an accommodating
sysadmin).

If a user wants to tie up resources, they can do a pretty good job
with the defaults:

  $ ulimit -a
  ...
  open files (-n) 1024
  ...
  max user processes (-u) 15863

1024 * 15863 = 16243712, which will easily exceed the system-wide
limit.

[The biggest problem with Unix resource limits is the inability to set
cumulative limits per user. Apart from -u, the limits are for each
individual process.]

--
Glynn Clements <glynn@gclements.plus.com>