[GRASS5] activated Large File Support LFS (default now)

Hi,

I have submitted changes to 6.1-CVS:
A couple of tests have been added to configure and friends
for Large File Support (default now). LFS is needed
to read/write/manage large raster files. The magic compiler
flag is activated automatically now in lib/gis/Makefile.

./configure ...
should report
  Large File Support (LFS): yes
is your system supports it. Otherwise it will be
automatically disabled.

Of course there is a flag:
./configure --help 2>&1 |grep large
  --disable-largefile omit support for large files

The changes are based on cdrtools-2.01 (GPL). Thanks
to Joerg Schilling <schilling fokus fraunhofer de> and
colleagues for developing the macros.

Markus

Markus Neteler wrote:

I have submitted changes to 6.1-CVS:
A couple of tests have been added to configure and friends
for Large File Support (default now). LFS is needed
to read/write/manage large raster files. The magic compiler
flag is activated automatically now in lib/gis/Makefile.

./configure ...
should report
  Large File Support (LFS): yes
is your system supports it. Otherwise it will be
automatically disabled.

Of course there is a flag:
./configure --help 2>&1 |grep large
  --disable-largefile omit support for large files

I don't know why I didn't comment on this at the time, but the sense
is wrong. Given that most of GRASS can't handle large files, it should
default to disabled.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Apr 09, 2005 at 04:23:23AM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> I have submitted changes to 6.1-CVS:
> A couple of tests have been added to configure and friends
> for Large File Support (default now). LFS is needed
> to read/write/manage large raster files. The magic compiler
> flag is activated automatically now in lib/gis/Makefile.
>
> ./configure ...
> should report
> Large File Support (LFS): yes
> is your system supports it. Otherwise it will be
> automatically disabled.
>
> Of course there is a flag:
> ./configure --help 2>&1 |grep large
> --disable-largefile omit support for large files

I don't know why I didn't comment on this at the time, but the sense
is wrong. Given that most of GRASS can't handle large files, it should
default to disabled.

OK for me.

But:
- no problems reported so far (AFAIK)
- if disabled by default we'll hardly discover where further
  fixes are missing (as nobody will use it)

Markus

Markus Neteler wrote:

> > I have submitted changes to 6.1-CVS:
> > A couple of tests have been added to configure and friends
> > for Large File Support (default now). LFS is needed
> > to read/write/manage large raster files. The magic compiler
> > flag is activated automatically now in lib/gis/Makefile.
> >
> > ./configure ...
> > should report
> > Large File Support (LFS): yes
> > is your system supports it. Otherwise it will be
> > automatically disabled.
> >
> > Of course there is a flag:
> > ./configure --help 2>&1 |grep large
> > --disable-largefile omit support for large files
>
> I don't know why I didn't comment on this at the time, but the sense
> is wrong. Given that most of GRASS can't handle large files, it should
> default to disabled.

OK for me.

But:
- no problems reported so far (AFAIK)

Very few people have tried to use such large maps. Of those who have,
I would only expect a small number of modules to have been tried.

- if disabled by default we'll hardly discover where further
  fixes are missing (as nobody will use it)

The nature of the problems caused by enabling large files in code
which can't handle them is such that it will typically take some
investigation to determine that the problem is due to large file
support.

In the worst case, the user will silently get bad data without being
aware that anything has gone wrong.

To get an idea of which files will have problems with large files,
locate anything which uses lseek, fseek or ftell (either with grep, or
use tools/sql.sh and query the import tables for those symbols).

Also, one conseqence of enabling large file support is that a raster
may have more than 2^31 cells in total. Code which counts cells (e.g.
r.statistics) will need to use "long long int" to handle that case.

--
Glynn Clements <glynn@gclements.plus.com>

On Sat, Apr 09, 2005 at 11:30:22PM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> > > I have submitted changes to 6.1-CVS:
> > > A couple of tests have been added to configure and friends
> > > for Large File Support (default now). LFS is needed
> > > to read/write/manage large raster files. The magic compiler
> > > flag is activated automatically now in lib/gis/Makefile.
> > >
> > > ./configure ...
> > > should report
> > > Large File Support (LFS): yes
> > > is your system supports it. Otherwise it will be
> > > automatically disabled.
> > >
> > > Of course there is a flag:
> > > ./configure --help 2>&1 |grep large
> > > --disable-largefile omit support for large files
> >
> > I don't know why I didn't comment on this at the time, but the sense
> > is wrong. Given that most of GRASS can't handle large files, it should
> > default to disabled.
>
> OK for me.
>
> But:
> - no problems reported so far (AFAIK)

Very few people have tried to use such large maps. Of those who have,
I would only expect a small number of modules to have been tried.

> - if disabled by default we'll hardly discover where further
> fixes are missing (as nobody will use it)

The nature of the problems caused by enabling large files in code
which can't handle them is such that it will typically take some
investigation to determine that the problem is due to large file
support.

In the worst case, the user will silently get bad data without being
aware that anything has gone wrong.

OK (I thought that it always print an error message).

To get an idea of which files will have problems with large files,
locate anything which uses lseek, fseek or ftell (either with grep, or
use tools/sql.sh and query the import tables for those symbols).

Also, one conseqence of enabling large file support is that a raster
may have more than 2^31 cells in total. Code which counts cells (e.g.
r.statistics) will need to use "long long int" to handle that case.

In the long run it would be nice to have GRASS code polished.

For now I have inverted the flag. To enable LFS, run

./configure --enable-largefile ...

Markus

Markus Neteler wrote:

> The nature of the problems caused by enabling large files in code
> which can't handle them is such that it will typically take some
> investigation to determine that the problem is due to large file
> support.
>
> In the worst case, the user will silently get bad data without being
> aware that anything has gone wrong.

OK (I thought that it always print an error message).

No. If you *don't* compile with -D_FILE_OFFSET_BITS=64, calls to
open() will fail with EOVERFLOW if the file is larger than 2GiB.

Defining that macro causes open() to be an alias for open64(), which
won't complain about large files. It also defines off_t to be an alias
for off64_t, and lseek() and alias for lseek64().

However, it won't magically convert arbitrary calculations involving
int/long to 64 bits.

So, if you have some code like:

  long row, row bytes;
  ...
  lseek(fd, row * row_bytes, SEEK_SET);

the calculation will be done in 32 bits, i.e. it will behave like:

  lseek(fd, (row * row_bytes) & 0xFFFFFFFF, SEEK_SET);

To fix this, you have to force the compiler to perform the
calculations in 64 bits, by casting values to off_t where necessary,
e.g.:

  lseek(fd, row * (off_t) row_bytes, SEEK_SET);

This needs to be done wherever file offsets are used.

Note: C determines the type of an expression by the type of its widest
operand, so you need to cast at least one of the values before the
calculation, as above. Using:

  (off_t) (row * row_bytes)

won't work; the calculation will be performed in 32 bits, then the
truncated value will be expanded to 64 bits (by which time, it's too
late).

> To get an idea of which files will have problems with large files,
> locate anything which uses lseek, fseek or ftell (either with grep, or
> use tools/sql.sh and query the import tables for those symbols).
>
> Also, one conseqence of enabling large file support is that a raster
> may have more than 2^31 cells in total. Code which counts cells (e.g.
> r.statistics) will need to use "long long int" to handle that case.

In the long run it would be nice to have GRASS code polished.

Sure, but there are a lot of places where these issues apply. Fixing
them is simple enough; it's finding them all that's awkward.

The file offsets can be dealt with by locating all modules which use
lseek() etc, fixing them, then defining the macro locally, e.g.

  #include "config.h"

  ...

  #ifdef HAVE_LARGEFILE
  #define _FILE_OFFSET_BITS 64
  #endif

[Assuming that the configure script defines HAVE_LARGEFILE in
config.h when --enable-largefile is used.]

The cases where cell counts might wrap is harder to identify.

[Sooner or later, someone will release a version of Linux/x86 where
"long" is 64 bits, then many of the problems will vanish. But not all
of them; I've encountered code where file offsets are computed as
"int"s.]

--
Glynn Clements <glynn@gclements.plus.com>