[GRASS-user] r.in.xyz: Could open text file ~ 2.5GB

I'm trying to open a text file to scan for extents:

r.in.xyz -s input=2006MB_GarryTrough_1N.txt output=2006MB_GarryTrough_1N
method=mean type=FCELL fs=space x=1 y=2 z=3 percent=100

Could not open input file
</home/epatton/Projects/Whalen/PERMANENT/WORK/2006MB_GarryTrough_1N.txt>.

I get the same error regardless of what percent parameter is set to (60, 40,
10). Is this an LFS problem with r.in.xyz?

~ Eric.

Patton, Eric wrote:

I'm trying to open a text file to scan for extents:

r.in.xyz -s input=2006MB_GarryTrough_1N.txt output=2006MB_GarryTrough_1N
method=mean type=FCELL fs=space x=1 y=2 z=3 percent=100

Could not open input file
</home/epatton/Projects/Whalen/PERMANENT/WORK/2006MB_GarryTrough_1N.txt>.

I get the same error regardless of what percent parameter is set to (60, 40,
10). Is this an LFS problem with r.in.xyz?

Yes. r.in.xyz won't accept input files >2GiB on a 32-bit system (one
where the C "long" type is 32 bits), regardless of whether or not
GRASS was built with --enable-largefile.

The attached patch might be sufficient, although the progress
percentage will be wrong (it will reach 100% at 2GiB rather than at
the end of the file).

--
Glynn Clements <glynn@gclements.plus.com>

(attachments)

r.in.xyz-lfs.diff (1015 Bytes)

Eric,

Have you tried running an identical command on a small sub sample of your data?

Personally, I haven't had any LFS support problems with r.in.xyz. This included scanning a text file for extents as your trying to do. After the initial thread was started with respect to r.in.xyz's LFS I went back and checked some of the test data sets that I have. They're all beyond the 2GB limit (3.7GB is the largest i've tested so far) and I haven't had any problems with the program. As far as I can see, the only thing I have noticed that is different with a file beyond the 2 gig size is that the progress indicator never comes up. I think most people can live with that. I have a 500 million point dataset that I'm trying to get out of las binary format right now. When that's done I'll pass that through r.in.xyz and report how that goes. I figure it'll be around 15GB in size, so if that doesn't trip up r.in.xyz then I don't think anything will.

Cheers,

Mike

On 19-Oct-06, at 11:19 AM, Patton, Eric wrote:

I'm trying to open a text file to scan for extents:

r.in.xyz -s input=2006MB_GarryTrough_1N.txt output=2006MB_GarryTrough_1N
method=mean type=FCELL fs=space x=1 y=2 z=3 percent=100

Could not open input file
</home/epatton/Projects/Whalen/PERMANENT/WORK/2006MB_GarryTrough_1N.txt>.

I get the same error regardless of what percent parameter is set to (60, 40,
10). Is this an LFS problem with r.in.xyz?

~ Eric.

_______________________________________________
grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com

Michael Perdue wrote:

Personally, I haven't had any LFS support problems with r.in.xyz.
This included scanning a text file for extents as your trying to do.
After the initial thread was started with respect to r.in.xyz's LFS I
went back and checked some of the test data sets that I have. They're
all beyond the 2GB limit (3.7GB is the largest i've tested so far)
and I haven't had any problems with the program. As far as I can see,
the only thing I have noticed that is different with a file beyond
the 2 gig size is that the progress indicator never comes up.

It sounds as if the OS maps fopen() to fopen64() (or uses O_LARGEFILE
in fopen()) regardless of the _FILE_OFFSET_BITS setting. In that case,
the fopen() will succeed regardless of the file's size, but the call
to fseek() will fail with EOVERFLOW.

Out of curiosity, which OS is that?

FWIW, this behaviour is a bad idea. If you can't reliably use
fseek/ftell on a file, fopen is supposed to fail when you try to open
the file, rather than succeeding then saying "ha! fooled you!" when
you try to read the current offset. If you are modifying an existing
file, failing half-way through (potentially leaving the file in an
inconsistent state) is a bad thing.

A correct call to fseek() shouldn't be *able* to fail on a normal file
opened with fopen(). You shouldn't have to worry about offset
overflows unless you use fopen64(), either literally or by using
-D_FILE_OFFSET_BITS=64 to remap the stdio functions to their 64-bit
equivalents.

--
Glynn Clements <glynn@gclements.plus.com>

On 20-Oct-06, at 8:12 AM, Glynn Clements wrote:

Michael Perdue wrote:

Personally, I haven't had any LFS support problems with r.in.xyz.
This included scanning a text file for extents as your trying to do.
After the initial thread was started with respect to r.in.xyz's LFS I
went back and checked some of the test data sets that I have. They're
all beyond the 2GB limit (3.7GB is the largest i've tested so far)
and I haven't had any problems with the program. As far as I can see,
the only thing I have noticed that is different with a file beyond
the 2 gig size is that the progress indicator never comes up.

It sounds as if the OS maps fopen() to fopen64() (or uses O_LARGEFILE
in fopen()) regardless of the _FILE_OFFSET_BITS setting. In that case,
the fopen() will succeed regardless of the file's size, but the call
to fseek() will fail with EOVERFLOW.

Out of curiosity, which OS is that?

FWIW, this behaviour is a bad idea. If you can't reliably use
fseek/ftell on a file, fopen is supposed to fail when you try to open
the file, rather than succeeding then saying "ha! fooled you!" when
you try to read the current offset. If you are modifying an existing
file, failing half-way through (potentially leaving the file in an
inconsistent state) is a bad thing.

A correct call to fseek() shouldn't be *able* to fail on a normal file
opened with fopen(). You shouldn't have to worry about offset
overflows unless you use fopen64(), either literally or by using
-D_FILE_OFFSET_BITS=64 to remap the stdio functions to their 64-bit
equivalents.

-- Glynn Clements <glynn@gclements.plus.com>

I'm running Mac OSX10.4.8 on a intel MacBook Pro (32bit). I'm pretty new to the mac OS (what can I say.... couldn't resist the sleek case it came in) and what your saying wouldn't surprise me at all. There have been a number of quirks that I've noticed about the mac, some of which have been driving me nuts. I'll have to do a little more experimentation with my linux workstation when I get it set up again.

Cheers,

Mike Perdue

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection aroundhttp://mail.yahoo.com

Eric wrote:
> I'm trying to open a text file to scan for extents:
>
> r.in.xyz -s input=2006MB_GarryTrough_1N.txt
> output=2006MB_GarryTrough_1N method=mean type=FCELL fs=space x=1
> y=2 z=3 percent=100

Glynn wrote:

     fseek(in_fd, 0L, SEEK_END);
     filesize = ftell(in_fd);
+ if (filesize < 0)
+ filesize = 0x7FFFFFFF;
     rewind(in_fd);

Hi,

sorry I am busy with other commitments and don't have time to get into
the discussion more ...

just a thought though, we really don't need to store the actual
filesize, we could just as well store filesize/10 or filesize/1024 and
then adjust the other calculations for that. We just need the ratio for
G_percent(), not the exact numbers.

then for the 2gb<filesize<4gb case, maybe something like

    if (ftell(in_fd) < 0)
  filesize_div10 = -1 * (0x7FFFFFFF - ftell(in_fd))/10;

? (not sure which direction the negative result from ftell() goes)

or store it as a double... ?

Hamish

Hamish wrote:

> > I'm trying to open a text file to scan for extents:
> >
> > r.in.xyz -s input=2006MB_GarryTrough_1N.txt
> > output=2006MB_GarryTrough_1N method=mean type=FCELL fs=space x=1
> > y=2 z=3 percent=100

Glynn wrote:
> fseek(in_fd, 0L, SEEK_END);
> filesize = ftell(in_fd);
> + if (filesize < 0)
> + filesize = 0x7FFFFFFF;
> rewind(in_fd);

Hi,

sorry I am busy with other commitments and don't have time to get into
the discussion more ...

just a thought though, we really don't need to store the actual
filesize, we could just as well store filesize/10 or filesize/1024 and
then adjust the other calculations for that. We just need the ratio for
G_percent(), not the exact numbers.

then for the 2gb<filesize<4gb case, maybe something like

    if (ftell(in_fd) < 0)
  filesize_div10 = -1 * (0x7FFFFFFF - ftell(in_fd))/10;

? (not sure which direction the negative result from ftell() goes)

or store it as a double... ?

The problem is that ftell() returns the result as a (signed) long. If
the result won't fit into a long, it returns -1 (and sets errno to
EOVERFLOW).

This can only happen if you also set _FILE_OFFSET_BITS to 64 so that
fopen() is redirected to fopen64(), otherwise fopen() will simply
refuse to open files larger than 2GiB (apparently, this isn't true on
some versions of MacOSX, which open the file anyhow then fail on
fseek/ftell once you've passed the 2GiB mark).

If you want to obtain the current offset for a file whose size exceeds
the range of a signed long, you instead have to use the (non-ANSI)
ftello() function, which returns the offset as an off_t. But before we
do that, we would need to add configure checks so that we don't try to
use ftello() on systems which don't provide it.

IOW, doing it right is non-trivial, and for a relatively minor benefit
(accurate progress indication for large files).

--
Glynn Clements <glynn@gclements.plus.com>