In some recent enhancements to the iostream library, include/iostream/ami_stream.h had a call to fseek() changed to fseeko(). This now doesn't compile on Windows; a sample error is:
sh-2.04$ make
make OBJ.i686-pc-mingw32
make[1]: Entering directory `/c/grass/grass7/lib/iostream'
make[1]: `OBJ.i686-pc-mingw32' is up to date.
make[1]: Leaving directory `/c/grass/grass7/lib/iostream'
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/ami_stream.o -c ami_stream.cc
In file included from ami_stream.cc:47:
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h: In member function `AMI_err AMI_STREAM<T>::seek(off_t)':
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: there are no arguments to `fseeko' that depend on a template parameter, so a declaration of `fseeko' must be available
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: (if you use `-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
make: *** [OBJ.i686-pc-mingw32/ami_stream.o] Error 1
sh-2.04$
Am I right in thinking that if large file support is configured properly there should be no need to use fseeko() to seek in large files, and fseek() should be fine? I looked in r.in.xyz which only uses fseek() and ISTR discussion about it handling huge files recently, although perhaps that involved reading from stdin.
In some recent enhancements to the iostream library, include/iostream/ami_stream.h had a call to fseek() changed to fseeko(). This now doesn't compile on Windows; a sample error is:
sh-2.04$ make
make OBJ.i686-pc-mingw32
make[1]: Entering directory `/c/grass/grass7/lib/iostream'
make[1]: `OBJ.i686-pc-mingw32' is up to date.
make[1]: Leaving directory `/c/grass/grass7/lib/iostream'
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/ami_stream.o -c ami_stream.cc
In file included from ami_stream.cc:47:
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h: In member function `AMI_err AMI_STREAM<T>::seek(off_t)':
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: there are no arguments to `fseeko' that depend on a template parameter, so a declaration of `fseeko' must be available
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: (if you use `-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
make: *** [OBJ.i686-pc-mingw32/ami_stream.o] Error 1
sh-2.04$
Am I right in thinking that if large file support is configured properly there should be no need to use fseeko() to seek in large files, and fseek() should be fine? I looked in r.in.xyz which only uses fseek() and ISTR discussion about it handling huge files recently, although perhaps that involved reading from stdin.
Paul
fseek is not the same as fseeko.
int fseek(FILE *stream, long offset, int whence);
int fseeko(FILE *stream, off_t offset, int whence);
If large file support is enabled, fseeko will convert off_t to a 64 bit type even on a 32-bit OS. I'm not sure if long will be 64 bit on a 64-bit OS for fseek.
r.in.xyz may work with large files using plain old fseek as long as r.in.xyz doesn't make long seeks. ami_stream.h could potentially make long seeks (e.g., skip the first 8GB of a 30GB file), though in practice, I'm not sure that code that uses ami_stream (r.terraflow) actually uses this kind of fseek. Still the interface is designed to potentially use long seeks and I would be reluctant to simply replace fseeko with fseek in ami_stream.h.
you may to add a #define to check the compiler type and use fseek only if fseeko does not exist.
int fseeko(FILE *stream, off_t offset, int whence);
If large file support is enabled, fseeko will convert off_t to a 64 bit type even on a 32-bit OS. I'm not sure if long will be 64 bit on a 64-bit OS for fseek.
r.in.xyz may work with large files using plain old fseek as long as r.in.xyz doesn't make long seeks. ami_stream.h could potentially make long seeks (e.g., skip the first 8GB of a 30GB file), though in practice, I'm not sure that code that uses ami_stream (r.terraflow) actually uses this kind of fseek. Still the interface is designed to potentially use long seeks and I would be reluctant to simply replace fseeko with fseek in ami_stream.h.
you may to add a #define to check the compiler type and use fseek only if fseeko does not exist.
Right - I was vaguely aware of some configure checks for fseeko() but as ami_stream.h was the only file in GRASS using fseeko() I wasn't sure if it had been made redundant by something else. But looking closely at include/config.h, I see that we can conditionalise fseeko() on #ifdef HAVE_LARGEFILES. After doing that, on Windows fseeko() is redefined to fseeko64() and all then works well.
A couple of further Windows compile errors in lib/iostream though:
sh-2.04$ make
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/mm_utils.o -c mm_utils.cc
mm_utils.cc:38:22: sys/mman.h: No such file or directory
make: *** [OBJ.i686-pc-mingw32/mm_utils.o] Error 1
sh-2.04$ make
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/mm_utils.o -c mm_utils.cc
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/rtimer.o -c rtimer.cc
rtimer.cc:38:26: sys/resource.h: No such file or directory
make: *** [OBJ.i686-pc-mingw32/rtimer.o] Error 1
In the two respective cases I commented out the #includes for sys/mman.h and sys/resource.h and everything compiled fine. r.terraflow compiles fine too. So I'll simply remove those two #includes unless there's any objection?
I looked in r.in.xyz which only uses fseek() and
ISTR discussion about it handling huge files recently,
although perhaps that involved reading from stdin.
fwiw the fseek() there is just to grab the filesize so G_percent() can
know where 100% is. If the fseek() returns garbage no harm is done and
G_percent() is skipped. It doesn't touch the processing aspect of the
module, just the cosmetic user message end of things.
I looked in r.in.xyz which only uses fseek() and
ISTR discussion about it handling huge files recently,
although perhaps that involved reading from stdin.
fwiw the fseek() there is just to grab the filesize so G_percent() can
know where 100% is. If the fseek() returns garbage no harm is done and
G_percent() is skipped. It doesn't touch the processing aspect of the
module, just the cosmetic user message end of things.
OK, interesting. From what I've seen then the patch below might help things for very large files. G_percent() still takes a long argument, so the number of lines can't be greater than can be represented in a long, but with the patch below, the actual filesize can be.
Are there any really huge files anybody can test it with?
On Tue, Oct 7, 2008 at 11:53 AM, Paul Kelly
<paul-grass@stjohnspoint.co.uk> wrote:
Are there any really huge files anybody can test it with?
For easily create a huge DEM file, do this:
Actually, thinking clearly, it's a huge input text points file it needs. So it was easy to create a random one to test it with. And it turns out as well as the conditionalised ftello(), it also needs #define USE_LARGEFILES 1
right at the start of the file to "activate" the Largefile support in config.h. But after that r.in.xyz works fine and reports the percentage progress correctly. I'm not sure if this is by design or by accident that it works like this but it's quite neat really.
I'm not sure if this is by design or by accident that it works like this but it's quite neat really.
Sorry that was a confusing statement - I meant that the fact that the large file support in config.h can be activated by #define USE_LARGEFILES 1 is confusing, as there is also a Makefile variable called USE_LARGEFILES that is set in the configure script.
Can we add library functions, G_fseek() and G_ftell(), which use an
off_t. The functions would use fseeko/ftello if available, fseek/ftell
otherwise, but would always accept and return off_t.
IMHO, this is preferable to having to add explicit LFS to individual
modules.
Can we add library functions, G_fseek() and G_ftell(), which use an
off_t. The functions would use fseeko/ftello if available, fseek/ftell
otherwise, but would always accept and return off_t.
But isn't off_t not always the same size - IIUC it depends on whether _FILE_OFFSET_BITS is defined and what it is set to? Should there be a typedef FILE_OFFSET which is equivalent to the "correct" off_t, or something similar?
> I'm not sure if this is by design or by accident that it
> works like this but it's quite neat really.
Sorry that was a confusing statement - I meant that the fact that the large
file support in config.h can be activated by #define USE_LARGEFILES 1 is
confusing, as there is also a Makefile variable called USE_LARGEFILES that
is set in the configure script.
The configure script doesn't define USE_LARGEFILES in config.h because
this would enable LFS globally, but not all modules are capable of
using it (i.e. some code may store offsets in an int or long, or
calculate offsets using "int"s, or use fseek/ftell.
If code cannot handle large files, it's better if it simply refuses to
open large files rather than opening them then silently failing (in
unpredictable and possibly destructive ways) when the 2GiB mark is
reached.
Code which is known to be LFS-capable normally has the following in
its Makefile:
Or we could just make a concerted effort to ensure that all code is
LFS-capable, and enable LFS globally. It's basically a matter of
identifying all uses of lseek(), fseek() and ftell(), and either
confirming that the code is already safe or fixing it.
>> +#ifdef HAVE_LARGEFILES
>> + filesize = ftello(in_fp);
>> +#else
>> filesize = ftell(in_fp);
>> +#endif
>
> Can we add library functions, G_fseek() and G_ftell(), which use an
> off_t. The functions would use fseeko/ftello if available, fseek/ftell
> otherwise, but would always accept and return off_t.
But isn't off_t not always the same size - IIUC it depends on whether
_FILE_OFFSET_BITS is defined and what it is set to? Should there be a
typedef FILE_OFFSET which is equivalent to the "correct" off_t, or
something similar?
Right. In order for this to work, we would need to enable LFS
unconditionally, so that there's no chance of module code getting a
different off_t to libgis.
Actually, we would need to do it via CFLAGS. Relying upon config.h is
unsafe, as stdio.h typically gets included before config.h.
OTOH, enabling LFS unconditionally all but requires the above
functions, as plain fseek/ftell are incompatible with LFS, and
fseeko/ftello aren't guaranteed to be available.