[GRASS5] rasters?

Hi developers,
we from the JGrass project have a problem with raster reading in the new GRASS
releases. Since end of 2005 some people reported problems with raster
reading.
I'm trying to track the problem down, but I really need some help. The bux
happens on the raster row decompression.
Markus already helped me with a bugfix that was made by Glynn on compression
of CELL files. That could one part of the problem but the bug happens to us
also in the FCELL case. Since the code is pretty complicated I would like to
collect some ideas before I dive into it, risking to get some more white
hairs.

Thanks to everyone that can give me some good point.
Best regards,
Andrea

--
____________________________________________________________________________
HydroloGIS - Environmental Safety Modelling
www.hydrologis.com

Andrea Antonello
Environmental Engineer
mobile: +393288497722

"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik
____________________________________________________________________________

Andrea Antonello wrote:

we from the JGrass project have a problem with raster reading in the new GRASS
releases. Since end of 2005 some people reported problems with raster
reading.
I'm trying to track the problem down, but I really need some help. The bux
happens on the raster row decompression.
Markus already helped me with a bugfix that was made by Glynn on compression
of CELL files. That could one part of the problem but the bug happens to us
also in the FCELL case. Since the code is pretty complicated I would like to
collect some ideas before I dive into it, risking to get some more white
hairs.

What are you asking, exactly?

GRASS floating-point rasters are encoded using XDR, then (optionally)
compressed with zlib.

For data which is compressed (according to the "compressed" field in
the cellhd file), each row begins with a flag byte which indicates
whether the data for that row is actually compressed; a value of 48
(ASCII code for '0') indicates uncompressed data, 49 (ASCII code for
'1') indicates compressed data.

For uncompressed data, there is no flag byte.

--
Glynn Clements <glynn@gclements.plus.com>

What are you asking, exactly?

Alright, thanks Glynn.

About two years ago I wrote the raster reading of GRASS in java for JGrass.
For the last to years everything worked flawless.
In the last months something stopped to work, i.e. when the java library tries
to read a raster map, it fails at the point of uncompressing the raster row.
Since it worked properly in the past years, I expect that something was
changed in the GRASS libs, so I was asking if anyone knows what that
something could be, in order to be able to understand and fix is.

Regards,
Andrea

GRASS floating-point rasters are encoded using XDR, then (optionally)
compressed with zlib.

For data which is compressed (according to the "compressed" field in
the cellhd file), each row begins with a flag byte which indicates
whether the data for that row is actually compressed; a value of 48
(ASCII code for '0') indicates uncompressed data, 49 (ASCII code for
'1') indicates compressed data.

For uncompressed data, there is no flag byte.

--
____________________________________________________________________________
HydroloGIS - Environmental Safety Modelling
www.hydrologis.com

Andrea Antonello
Environmental Engineer
mobile: +393288497722

"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik
____________________________________________________________________________

Andrea Antonello wrote:

> What are you asking, exactly?

Alright, thanks Glynn.

About two years ago I wrote the raster reading of GRASS in java for JGrass.
For the last to years everything worked flawless.
In the last months something stopped to work, i.e. when the java library tries
to read a raster map, it fails at the point of uncompressing the raster row.
Since it worked properly in the past years, I expect that something was
changed in the GRASS libs, so I was asking if anyone knows what that
something could be, in order to be able to understand and fix is.

I can only think of two changes:

1. For compressed maps (integer or FP), the size of the row offsets is
sizeof(off_t) rather than sizeof(long). The main consequence of this
is that 8-byte offsets will now be more common, so if your code only
handles 4-byte offsets, it will fail more often.

2. Integer maps can be compressed using either RLE or zlib. Maps which
are compressed using zlib have a value of 2 in the "compressed" field
of the cellhd file.

--
Glynn Clements <glynn@gclements.plus.com>

I can only think of two changes:

1. For compressed maps (integer or FP), the size of the row offsets is
sizeof(off_t) rather than sizeof(long). The main consequence of this
is that 8-byte offsets will now be more common, so if your code only
handles 4-byte offsets, it will fail more often.

maybe why the segment lib is showing read() past end of file errors in
r.cost, r.walk, etc.
?

2. Integer maps can be compressed using either RLE or zlib. Maps which
are compressed using zlib have a value of 2 in the "compressed" field
of the cellhd file.

Hamish

Hamish wrote:

> I can only think of two changes:
>
> 1. For compressed maps (integer or FP), the size of the row offsets is
> sizeof(off_t) rather than sizeof(long). The main consequence of this
> is that 8-byte offsets will now be more common, so if your code only
> handles 4-byte offsets, it will fail more often.

maybe why the segment lib is showing read() past end of file errors in
r.cost, r.walk, etc.
?

No. This only affects code which accesses the raster files directly
(i.e. lib/gis or third-party code which does its own I/O).

--
Glynn Clements <glynn@gclements.plus.com>

I can only think of two changes:

1. For compressed maps (integer or FP), the size of the row offsets is
sizeof(off_t) rather than sizeof(long). The main consequence of this
is that 8-byte offsets will now be more common, so if your code only
handles 4-byte offsets, it will fail more often.

I'm not completely sure but it should be that, which should again be easy to
solve (hopefully). I will check and in case come back with new questions.

Do you know when more or less sizeof(long) was changed.

Thanks for your help,
Andrea

--
____________________________________________________________________________
HydroloGIS - Environmental Safety Modelling
www.hydrologis.com

Andrea Antonello
Environmental Engineer
mobile: +393288497722

"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik
____________________________________________________________________________

On Tue, Mar 14, 2006 at 08:42:40AM +0100, Andrea Antonello wrote:

> I can only think of two changes:
>
> 1. For compressed maps (integer or FP), the size of the row offsets is
> sizeof(off_t) rather than sizeof(long). The main consequence of this
> is that 8-byte offsets will now be more common, so if your code only
> handles 4-byte offsets, it will fail more often.

I'm not completely sure but it should be that, which should again be easy to
solve (hopefully). I will check and in case come back with new questions.

Do you know when more or less sizeof(long) was changed.

Hi Andrea,

if you to find out such things by yourself:

# in GRASS 6.1-CVS
cd lib/gis/

# generate local ChangeLog file (http://www.red-bean.com/cvs2cl/):
cvs2cl.pl --follow trunk
less ChangeLog

Then watch out for relevant changes.

Best

Markus

Andrea Antonello wrote:

> I can only think of two changes:
>
> 1. For compressed maps (integer or FP), the size of the row offsets is
> sizeof(off_t) rather than sizeof(long). The main consequence of this
> is that 8-byte offsets will now be more common, so if your code only
> handles 4-byte offsets, it will fail more often.

I'm not completely sure but it should be that, which should again be easy to
solve (hopefully). I will check and in case come back with new questions.

Do you know when more or less sizeof(long) was changed.

The initial change was:

  revision 1.4
  date: 2004/08/12 13:53:51; author: glynn; state: Exp; lines: +97 -77
  Raster I/O cleanup
  Use off_t instead of long, to allow for files >2Gb

However, this version shrank the row pointers to 4 bytes if the file
size was <4Gb (i.e. in most cases).

This didn't work, as it would shrink the pointers when the file was
empty, then enlarge them again if the file was >=4Gb when it was
closed, overwriting the beginning of the data.

That was changed in:

  revision 2.1
  date: 2005/08/23 21:23:08; author: glynn; state: Exp; lines: +0 -3
  Fix bug where row pointers were enlarged to 8 bytes, overwriting the
  beginning of the data (discovered by Andrew Danner).

Since then, the row pointers have always been sizeof(off_t). On x86,
this will be either 4 or 8 bytes depending upon whether the
--enable-largefile option was passed to the configure script.

--
Glynn Clements <glynn@gclements.plus.com>

Hi Markus,
thank you.

if you to find out such things by yourself:

# in GRASS 6.1-CVS
cd lib/gis/

# generate local ChangeLog file (http://www.red-bean.com/cvs2cl/):
cvs2cl.pl --follow trunk
less ChangeLog

Then watch out for relevant changes.

Best

Markus

--
____________________________________________________________________________
HydroloGIS - Environmental Safety Modelling
www.hydrologis.com

Andrea Antonello
Environmental Engineer
mobile: +393288497722

"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik
____________________________________________________________________________

The initial change was:

  revision 1.4
  date: 2004/08/12 13:53:51; author: glynn; state: Exp; lines: +97 -77
  Raster I/O cleanup
  Use off_t instead of long, to allow for files >2Gb

However, this version shrank the row pointers to 4 bytes if the file
size was <4Gb (i.e. in most cases).

This didn't work, as it would shrink the pointers when the file was
empty, then enlarge them again if the file was >=4Gb when it was
closed, overwriting the beginning of the data.

That was changed in:

  revision 2.1
  date: 2005/08/23 21:23:08; author: glynn; state: Exp; lines: +0 -3
  Fix bug where row pointers were enlarged to 8 bytes, overwriting the
  beginning of the data (discovered by Andrew Danner).

Since then, the row pointers have always been sizeof(off_t). On x86,
this will be either 4 or 8 bytes depending upon whether the
--enable-largefile option was passed to the configure script.

Alright,
I'm not sure that that could be our problem but I will check. The chronology
of these events doesn't seem to be in sync with our, but you never know,
problems come to the surface whenever they want :slight_smile:

Anyway thank you for the help,
Cheers,
Andrea

--
____________________________________________________________________________
HydroloGIS - Environmental Safety Modelling
www.hydrologis.com

Andrea Antonello
Environmental Engineer
mobile: +393288497722

"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik
____________________________________________________________________________

Hi Glynn,
thanks for your hint, it helped me solve the problem.

I assume we didn't notice the problem for a long while, since the
--enable-largefile
option is switched off by default
Only the Macosx binary version seems to have it on by default (and that is how
we noticed it).
Can someone correct me if I'm wrong?

Some lines to clarify (for me mostly):
The first byte of a raster file cast to a number (between 0 and 255) contains
the number of bytes that can be used to store the row addresses in the header
and therefore it also is the limit for raster size, right?

However, this version shrank the row pointers to 4 bytes if the file
size was <4Gb (i.e. in most cases).

Alright. Which would look like the proper behaviour to me...

This didn't work, as it would shrink the pointers when the file was
empty, then enlarge them again if the file was >=4Gb when it was
closed, overwriting the beginning of the data.

Alright, I understand what happens, but does this mean that that byte is now
always set to 8 is the LFS is enabled?

Best regards,
Andrea

--
____________________________________________________________________________
HydroloGIS - Environmental Safety Modelling
www.hydrologis.com

Andrea Antonello
Environmental Engineer
mobile: +393288497722

"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik
____________________________________________________________________________

Andrea Antonello wrote:

Some lines to clarify (for me mostly):
The first byte of a raster file cast to a number (between 0 and 255) contains
the number of bytes that can be used to store the row addresses in the header
and therefore it also is the limit for raster size, right?

Correct.

> However, this version shrank the row pointers to 4 bytes if the file
> size was <4Gb (i.e. in most cases).

Alright. Which would look like the proper behaviour to me...

Except that we don't know how large the file will eventually be when
the offset table is first written out.

> This didn't work, as it would shrink the pointers when the file was
> empty, then enlarge them again if the file was >=4Gb when it was
> closed, overwriting the beginning of the data.

Alright, I understand what happens, but does this mean that that byte is now
always set to 8 is the LFS is enabled?

It is always set to sizeof(off_t), whatever that happens to be.

The code which reads and writes the row pointers is entirely generic;
if you had a platform with a 7-byte off_t, it would work. Similarly,
you can read maps with 8-byte row pointers on a system where
sizeof(off_t) is 4, provided that the file doesn't actually exceed
2Gb.

--
Glynn Clements <glynn@gclements.plus.com>