[GRASS5] r.series, max files open

I am trying to get r.series to produce time series stats for 365 raster
files.

It dies after opening about 254 files through. (sorry don't have the
exact error message on hand)

I assume it is running up against a max files open limit, so I haven't
looked too closely, could be something else (memory;
inputfiles=raster1,raster2,...,raster365 makes the command line pretty
long; for example), or a dumb mistake on my part.

I'm running Linux 2.4.19 with r.series from CVS.

If it is a max file limit thing, r.series should be able to work around
that.. open 200, dump them to memory, fclose all, read another 200, etc.
until all are loaded and then do the math. I would think a time series
program should be able to handle at least 8760 (number of hours/year)
records? 365 at minimum anyway.

any ideas?

see also:
http://article.gmane.org/gmane.comp.gis.grass.user/427

thanks,
Hamish

H Bowman wrote:

I am trying to get r.series to produce time series stats for 365 raster
files.

It dies after opening about 254 files through. (sorry don't have the
exact error message on hand)

I assume it is running up against a max files open limit,

Yes. Specifically, libgis has a fixed limit of 256 open raster maps,
set at the top of src/libes/gis/G.h:

  #define MAXFILES 256

so I haven't
looked too closely, could be something else (memory;
inputfiles=raster1,raster2,...,raster365 makes the command line pretty
long; for example),

That could also be an issue, but it isn't the problem here (if the
command line was too long, you would get an error from the shell,
before r.series was run).

or a dumb mistake on my part.

I'm running Linux 2.4.19 with r.series from CVS.

If it is a max file limit thing, r.series should be able to work around
that.. open 200, dump them to memory, fclose all, read another 200, etc.
until all are loaded and then do the math.

Store the entire series in memory?

For large files, that would just replace an out-of-descriptors error
with an out-of-memory error.

I would think a time series program should be able to handle at
least 8760 (number of hours/year) records? 365 at minimum anyway.

Unless these are *really* small files, reading 8760 of them into
memory is probably out of the question. OTOH, having 8760 files open
simulataneously is likely to be equally problematic.

We could just increase the MAXFILES value. However, each slot uses 552
bytes on x86, so memory consumption could be an issue (bearing in mind
that it affects every process which uses libgis). Also, there's no
point increasing it beyond the OS limit (so 8760 files may not be
possible, even if you can afford an extra 4.6Mb per process).

The 552-byte figure could be reduced a bit by more sensible memory
management. E.g. each slot includes a "struct Reclass", which
statically allocates 100 bytes for the name and mapset of the base
map, whereas two pointers would only use 8 bytes.

But primarily, we would want to allocate the array of slots
dynamically, so that only processes which actually used thousands of
slots would allocate the memory for them.

The main problem there is that the code in question is critical to
GRASS. Any errors could result in large chunks of GRASS being
unusable. The other problem is that most of the code which uses that
structure is an illegible mess.

--
Glynn Clements <glynn.clements@virgin.net>

On Tue, May 06, 2003 at 12:49:30PM +1200, H Bowman wrote:

I am trying to get r.series to produce time series stats for 365 raster
files.

It dies after opening about 254 files through. (sorry don't have the
exact error message on hand)

I assume it is running up against a max files open limit, so I haven't
looked too closely, could be something else (memory;
inputfiles=raster1,raster2,...,raster365 makes the command line pretty
long; for example), or a dumb mistake on my part.

... no idea, but a suggestion:

r.out.mpeg supports wildcards which may be an interesting addition
also for r.series:

r.out.mpeg view1="rain[1-9]","rain1[0-2]" view2="temp*"

Markus

> I am trying to get r.series to produce time series stats for 365
> raster files.
> It dies after opening about 254 files through. (sorry don't have the
> exact error message on hand)
> I assume it is running up against a max files open limit,

Yes. Specifically, libgis has a fixed limit of 256 open raster maps,
set at the top of src/libes/gis/G.h:
  #define MAXFILES 256

Changing MAXFILES to 384 fixes the problem and r.series now runs with
365 input files. Thanks, that solves my immediate problem.

> If it is a max file limit thing, r.series should be able to work
> around that.. open 200, dump them to memory, fclose all, read
> another 200, etc. until all are loaded and then do the math.

Store the entire series in memory?
For large files, that would just replace an out-of-descriptors error
with an out-of-memory error.

Maybe only load rasters above MAXFILES into memory until you run out. I
admit that doesn't seem like a very good solution either.
Maybe just put a note in the man page and leave it at that..?

We could just increase the MAXFILES value. However, each slot uses 552
bytes on x86, so memory consumption could be an issue (bearing in mind
that it affects every process which uses libgis). Also, there's no
point increasing it beyond the OS limit (so 8760 files may not be
possible, even if you can afford an extra 4.6Mb per process).

Just curious if anyone knows what the operating system open file limits
are for Linux/Irix/Solaris/MacOSX/Win98/WinNT ?

What other modules besides r.patch and r.series would benefit from
increasing the MAXFILES value?

Markus wrote:

r.out.mpeg supports wildcards which may be an interesting addition
also for r.series:

r.out.mpeg view1="rain[1-9]","rain1[0-2]" view2="temp*"

Yes, that would be very useful and the code looks like it would copy
over well.

Hamish

H Bowman wrote:

> We could just increase the MAXFILES value. However, each slot uses 552
> bytes on x86, so memory consumption could be an issue (bearing in mind
> that it affects every process which uses libgis). Also, there's no
> point increasing it beyond the OS limit (so 8760 files may not be
> possible, even if you can afford an extra 4.6Mb per process).

Just curious if anyone knows what the operating system open file limits
are for Linux/Irix/Solaris/MacOSX/Win98/WinNT ?

IIRC, for Linux, it was 256 in 2.0, 1024 in 2.2 and run-time
configurable in 2.4.

However, there is also the issue that select() is limited to the
number of bits in an fd_set (1024 in GNU libc 2.1). Although select()
isn't used on files, if a program uses up the first 1024 descriptors
for files, it won't be able to use select() on any other descriptors
(e.g. sockets, the terminal, devices).

In general, it's best to avoid solutions which rely upon having vast
numbers of files open simultaneously. OTOH, it's also best to avoid
solutions which rely upon storing vast numbers of files in memory
simultaneously.

For processing a time series of 8760 files, the most viable solutions
would be:

a) Open/close each map for each row read, rather than keeping them
open throughout.

b) Generate an intermediate file in BIL order (map0/row0, ...,
mapN/row0, map1/row1, ...) using random access on the result file,
then process the intermediate file.

The second approach would be faster (it avoids opening and closing
each map once per row), but would require a significant amount of disk
space (but it's better than requiring the same amount of virtual
memory).

--
Glynn Clements <glynn.clements@virgin.net>