[GRASS-dev] fseeko() in libiostream

In some recent enhancements to the iostream library, include/iostream/ami_stream.h had a call to fseek() changed to fseeko(). This now doesn't compile on Windows; a sample error is:

sh-2.04$ make
make OBJ.i686-pc-mingw32
make[1]: Entering directory `/c/grass/grass7/lib/iostream'
make[1]: `OBJ.i686-pc-mingw32' is up to date.
make[1]: Leaving directory `/c/grass/grass7/lib/iostream'
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/ami_stream.o -c ami_stream.cc
In file included from ami_stream.cc:47:
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h: In member function `AMI_err AMI_STREAM<T>::seek(off_t)':
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: there are no arguments to `fseeko' that depend on a template parameter, so a declaration of `fseeko' must be available
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: (if you use `-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
make: *** [OBJ.i686-pc-mingw32/ami_stream.o] Error 1
sh-2.04$

Am I right in thinking that if large file support is configured properly there should be no need to use fseeko() to seek in large files, and fseek() should be fine? I looked in r.in.xyz which only uses fseek() and ISTR discussion about it handling huge files recently, although perhaps that involved reading from stdin.

Paul

Paul Kelly wrote:

In some recent enhancements to the iostream library, include/iostream/ami_stream.h had a call to fseek() changed to fseeko(). This now doesn't compile on Windows; a sample error is:

sh-2.04$ make
make OBJ.i686-pc-mingw32
make[1]: Entering directory `/c/grass/grass7/lib/iostream'
make[1]: `OBJ.i686-pc-mingw32' is up to date.
make[1]: Leaving directory `/c/grass/grass7/lib/iostream'
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/ami_stream.o -c ami_stream.cc
In file included from ami_stream.cc:47:
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h: In member function `AMI_err AMI_STREAM<T>::seek(off_t)':
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: there are no arguments to `fseeko' that depend on a template parameter, so a declaration of `fseeko' must be available
c:/grass/grass7/dist.i686-pc-mingw32/include/grass/iostream/ami_stream.h:432: error: (if you use `-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated)
make: *** [OBJ.i686-pc-mingw32/ami_stream.o] Error 1
sh-2.04$

Am I right in thinking that if large file support is configured properly there should be no need to use fseeko() to seek in large files, and fseek() should be fine? I looked in r.in.xyz which only uses fseek() and ISTR discussion about it handling huge files recently, although perhaps that involved reading from stdin.

Paul

fseek is not the same as fseeko.

int fseek(FILE *stream, long offset, int whence);

int fseeko(FILE *stream, off_t offset, int whence);

If large file support is enabled, fseeko will convert off_t to a 64 bit type even on a 32-bit OS. I'm not sure if long will be 64 bit on a 64-bit OS for fseek.

r.in.xyz may work with large files using plain old fseek as long as r.in.xyz doesn't make long seeks. ami_stream.h could potentially make long seeks (e.g., skip the first 8GB of a 30GB file), though in practice, I'm not sure that code that uses ami_stream (r.terraflow) actually uses this kind of fseek. Still the interface is designed to potentially use long seeks and I would be reluctant to simply replace fseeko with fseek in ami_stream.h.

you may to add a #define to check the compiler type and use fseek only if fseeko does not exist.

-Andy

On Mon, 6 Oct 2008, Andrew Danner wrote:

fseek is not the same as fseeko.

int fseek(FILE *stream, long offset, int whence);

int fseeko(FILE *stream, off_t offset, int whence);

If large file support is enabled, fseeko will convert off_t to a 64 bit type even on a 32-bit OS. I'm not sure if long will be 64 bit on a 64-bit OS for fseek.

r.in.xyz may work with large files using plain old fseek as long as r.in.xyz doesn't make long seeks. ami_stream.h could potentially make long seeks (e.g., skip the first 8GB of a 30GB file), though in practice, I'm not sure that code that uses ami_stream (r.terraflow) actually uses this kind of fseek. Still the interface is designed to potentially use long seeks and I would be reluctant to simply replace fseeko with fseek in ami_stream.h.

you may to add a #define to check the compiler type and use fseek only if fseeko does not exist.

Right - I was vaguely aware of some configure checks for fseeko() but as ami_stream.h was the only file in GRASS using fseeko() I wasn't sure if it had been made redundant by something else. But looking closely at include/config.h, I see that we can conditionalise fseeko() on
#ifdef HAVE_LARGEFILES. After doing that, on Windows fseeko() is redefined to fseeko64() and all then works well.

A couple of further Windows compile errors in lib/iostream though:

sh-2.04$ make
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/mm_utils.o -c mm_utils.cc
mm_utils.cc:38:22: sys/mman.h: No such file or directory
make: *** [OBJ.i686-pc-mingw32/mm_utils.o] Error 1

sh-2.04$ make
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/mm_utils.o -c mm_utils.cc
c++ -I/c/grass/grass7/dist.i686-pc-mingw32/include -I/c/grass/extra/include -O2 -s -I/c/grass/extra/include -D_FILE_OFFSET_BITS=64 -DPACKAGE=\""grasslibs"\" -I/c/grass/grass7/dist.i686-pc-mingw32/include -o OBJ.i686-pc-mingw32/rtimer.o -c rtimer.cc
rtimer.cc:38:26: sys/resource.h: No such file or directory
make: *** [OBJ.i686-pc-mingw32/rtimer.o] Error 1

In the two respective cases I commented out the #includes for sys/mman.h and sys/resource.h and everything compiled fine. r.terraflow compiles fine too. So I'll simply remove those two #includes unless there's any objection?

Thanks for the feedback,

Paul

Paul Kelly wrote:

I looked in r.in.xyz which only uses fseek() and
ISTR discussion about it handling huge files recently,
although perhaps that involved reading from stdin.

fwiw the fseek() there is just to grab the filesize so G_percent() can
know where 100% is. If the fseek() returns garbage no harm is done and
G_percent() is skipped. It doesn't touch the processing aspect of the
module, just the cosmetic user message end of things.

Hamish

On Mon, 6 Oct 2008, Hamish wrote:

Paul Kelly wrote:

I looked in r.in.xyz which only uses fseek() and
ISTR discussion about it handling huge files recently,
although perhaps that involved reading from stdin.

fwiw the fseek() there is just to grab the filesize so G_percent() can
know where 100% is. If the fseek() returns garbage no harm is done and
G_percent() is skipped. It doesn't touch the processing aspect of the
module, just the cosmetic user message end of things.

OK, interesting. From what I've seen then the patch below might help things for very large files. G_percent() still takes a long argument, so the number of lines can't be greater than can be represented in a long, but with the patch below, the actual filesize can be.

Are there any really huge files anybody can test it with?

Paul

Index: main.c

--- main.c (revision 33696)
+++ main.c (working copy)
@@ -502,7 +502,11 @@
             linesize = strlen(buff) + 1;
         }
         fseek(in_fp, 0L, SEEK_END);
+#ifdef HAVE_LARGEFILES
+ filesize = ftello(in_fp);
+#else
         filesize = ftell(in_fp);
+#endif
         rewind(in_fp);
         if (linesize < 6) /* min possible: "0,0,0\n" */
             linesize = 6;

On Tue, Oct 7, 2008 at 11:53 AM, Paul Kelly
<paul-grass@stjohnspoint.co.uk> wrote:

Are there any really huge files anybody can test it with?

For easily create a huge DEM file, do this:

SRTM V4 CGIAR Europe:

-------------------- snip fetch_europe.sh -------------
#!/bin/sh

# fetch Europe block
# http://srtm.csi.cgiar.org/SELECTION/inputCoord.asp

for e in `seq 01 07 | awk '{printf "%02d\n", $1}' ` ; do
        for s in `seq 34 45` ; do
                #
http://hypersphere.telascience.org/elevation/cgiar_srtm_v4/tiff/zip/srtm_33_08.ZIP
                wget -c
http://hypersphere.telascience.org/elevation/cgiar_srtm_v4/tiff/zip/srtm_$\{s\}\_$\{e\}\.ZIP
        done
done
-------------------- snap -------------

# extract:
for i in *.ZIP ; do unzip -o $i ; NAME=`basename $i .ZIP` ; mv
$NAME.TIF $NAME.tif; done

# merge into single LatLong SRTM:
gdalwarp -s_srs epsg:4326 srtm_40_03.tif srtm_40_04.tif srtm_35_03.tif
srtm_40_05.tif srtm_35_04.tif srtm_40_06.tif srtm_35_05.tif
srtm_35_06.tif srtm_41_03.tif srtm_41_04.tif srtm_36_03.tif
srtm_41_05.tif srtm_36_04.tif srtm_41_06.tif srtm_36_05.tif
srtm_36_06.tif srtm_37_03.tif srtm_37_04.tif srtm_37_05.tif
srtm_37_06.tif srtm_38_03.tif srtm_38_04.tif srtm_38_05.tif
srtm_38_06.tif srtm_39_03.tif srtm_39_04.tif srtm_39_05.tif
srtm_39_06.tif europe_south_srtmV4_cgiar_LL.tif

# 2.7GB:
gdalinfo europe_south_srtmV4_cgiar_LL.tif

r.in.gdal ...

hope this helps,
Markus

On Tue, 7 Oct 2008, Markus Neteler wrote:

On Tue, Oct 7, 2008 at 11:53 AM, Paul Kelly
<paul-grass@stjohnspoint.co.uk> wrote:

Are there any really huge files anybody can test it with?

For easily create a huge DEM file, do this:

Actually, thinking clearly, it's a huge input text points file it needs. So it was easy to create a random one to test it with. And it turns out as well as the conditionalised ftello(), it also needs
#define USE_LARGEFILES 1
right at the start of the file to "activate" the Largefile support in config.h. But after that r.in.xyz works fine and reports the percentage progress correctly. I'm not sure if this is by design or by accident that it works like this but it's quite neat really.

If we get this hammered out properly it can go on http://grass.osgeo.org/wiki/Large_File_Support

Paul

On Tue, 7 Oct 2008, Paul Kelly wrote:

I'm not sure if this is by design or by accident that it works like this but it's quite neat really.

Sorry that was a confusing statement - I meant that the fact that the large file support in config.h can be activated by #define USE_LARGEFILES 1 is confusing, as there is also a Makefile variable called USE_LARGEFILES that is set in the configure script.

If we get this hammered out properly it can go on http://grass.osgeo.org/wiki/Large_File_Support

Paul
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Paul Kelly wrote:

+#ifdef HAVE_LARGEFILES
+ filesize = ftello(in_fp);
+#else
         filesize = ftell(in_fp);
+#endif

Can we add library functions, G_fseek() and G_ftell(), which use an
off_t. The functions would use fseeko/ftello if available, fseek/ftell
otherwise, but would always accept and return off_t.

IMHO, this is preferable to having to add explicit LFS to individual
modules.

--
Glynn Clements <glynn@gclements.plus.com>

On Tue, 7 Oct 2008, Glynn Clements wrote:

Paul Kelly wrote:

+#ifdef HAVE_LARGEFILES
+ filesize = ftello(in_fp);
+#else
         filesize = ftell(in_fp);
+#endif

Can we add library functions, G_fseek() and G_ftell(), which use an
off_t. The functions would use fseeko/ftello if available, fseek/ftell
otherwise, but would always accept and return off_t.

But isn't off_t not always the same size - IIUC it depends on whether _FILE_OFFSET_BITS is defined and what it is set to? Should there be a typedef FILE_OFFSET which is equivalent to the "correct" off_t, or something similar?

Paul Kelly wrote:

> I'm not sure if this is by design or by accident that it
> works like this but it's quite neat really.

Sorry that was a confusing statement - I meant that the fact that the large
file support in config.h can be activated by #define USE_LARGEFILES 1 is
confusing, as there is also a Makefile variable called USE_LARGEFILES that
is set in the configure script.

The configure script doesn't define USE_LARGEFILES in config.h because
this would enable LFS globally, but not all modules are capable of
using it (i.e. some code may store offsets in an int or long, or
calculate offsets using "int"s, or use fseek/ftell.

If code cannot handle large files, it's better if it simply refuses to
open large files rather than opening them then silently failing (in
unpredictable and possibly destructive ways) when the 2GiB mark is
reached.

Code which is known to be LFS-capable normally has the following in
its Makefile:

  ifneq ($(USE_LARGEFILES),)
    EXTRA_CFLAGS = -D_FILE_OFFSET_BITS=64
  endif

This should probably be changed to:

  ifneq ($(USE_LARGEFILES),)
    EXTRA_CFLAGS = -DUSE_LARGEFILES=1
  endif

Or we could just make a concerted effort to ensure that all code is
LFS-capable, and enable LFS globally. It's basically a matter of
identifying all uses of lseek(), fseek() and ftell(), and either
confirming that the code is already safe or fixing it.

--
Glynn Clements <glynn@gclements.plus.com>

Paul Kelly wrote:

>> +#ifdef HAVE_LARGEFILES
>> + filesize = ftello(in_fp);
>> +#else
>> filesize = ftell(in_fp);
>> +#endif
>
> Can we add library functions, G_fseek() and G_ftell(), which use an
> off_t. The functions would use fseeko/ftello if available, fseek/ftell
> otherwise, but would always accept and return off_t.

But isn't off_t not always the same size - IIUC it depends on whether
_FILE_OFFSET_BITS is defined and what it is set to? Should there be a
typedef FILE_OFFSET which is equivalent to the "correct" off_t, or
something similar?

Right. In order for this to work, we would need to enable LFS
unconditionally, so that there's no chance of module code getting a
different off_t to libgis.

Actually, we would need to do it via CFLAGS. Relying upon config.h is
unsafe, as stdio.h typically gets included before config.h.

OTOH, enabling LFS unconditionally all but requires the above
functions, as plain fseek/ftell are incompatible with LFS, and
fseeko/ftello aren't guaranteed to be available.

--
Glynn Clements <glynn@gclements.plus.com>