[GRASS5] the floating raster file format

Hy folks,
after a good bottle of Montpulciano D'abruzzo and seven hours (9pm Saturday to
4am Sunday) in the pre-eastern night, with the help of my good friend "el
Pitero", the amazing Khexedit, the Grass sources and the r.compress command,
we finally have the binary raster format of floating point maps in Grass.
I'ld like to share that, since apart of the bottle of good wine it was a
heavy experience and none should repeat it:). I hope somebody can find
usefull what follows and give me hints or make me notice possible errors.

Alright, let's get started:
My explanation example is a 3x3 floating point file.
At the begin of the file we have 4 bytes, These are the same for compressed,
uncompressed, integer or floating, so I left them aside for now. After these
4 bytes, other three pairs of 4 bytes follow, which contain the address of
each row of data in the file (this also tells us, that there is a limit in
the dimension of raster files, even if huge). After these bytes the rows of
data start. Each row can be simply passed through a zlib decompressor. After
that action we have the row of data in uncompressed format. The only thing
left to do is to get out of that stream the right dataformat. That's all,
rather simple if you know how to do it :slight_smile:

With the help of a small example:

The file is the following 3x3 floating point matrix:
10.000 20.000 30.000
20.000 40.000 50.000
30.000 50.000 60.000

Here is the binary and decimal version:
THE HEADER PART:
binary: 00000100 00000000 00000000 00000000
decimal: 4 0 0 0

00010001 00000000 00000000 00000000
17 0 0 0

00100100 00000000 00000000 00000000
36 0 0 0

00110111 00000000 00000000 00000000
55 0 0 0

01001010 00110001
74 49

COMMENTS:
4 could be the number of bytes used to define the address of each row, but I
would not bet about that.
17 is the adress in bytes after which the first row of data starts,
36 is the same for the second row and 55 for the third. 74 is the end of file.
I noticed the last byte only now, I will try to understand it later.

After that, the rows of data come and since we know the offset between the
rows, we know exactly how many bytes to send through the zlib decompressor.

Any comment or hint is very appreciated.
Hope this was kind of clear...
Cheers,
Andrea

--
____________________________________________________________________________

University of Trento
Department of Civil and Environmental Engineering
Via Mesiano, 77 - Trento (ITALY)

Andrea Antonello
tel: +393288497722
fax: +390461882672
____________________________________________________________________________

Andrea Antonello wrote:

after a good bottle of Montpulciano D'abruzzo and seven hours (9pm Saturday to
4am Sunday) in the pre-eastern night, with the help of my good friend "el
Pitero", the amazing Khexedit, the Grass sources and the r.compress command,
we finally have the binary raster format of floating point maps in Grass.
I'ld like to share that, since apart of the bottle of good wine it was a
heavy experience and none should repeat it:). I hope somebody can find
usefull what follows and give me hints or make me notice possible errors.

Alright, let's get started:
My explanation example is a 3x3 floating point file.
At the begin of the file we have 4 bytes, These are the same for compressed,
uncompressed, integer or floating, so I left them aside for now. After these
4 bytes, other three pairs of 4 bytes follow, which contain the address of
each row of data in the file (this also tells us, that there is a limit in
the dimension of raster files, even if huge). After these bytes the rows of
data start. Each row can be simply passed through a zlib decompressor. After
that action we have the row of data in uncompressed format. The only thing
left to do is to get out of that stream the right dataformat. That's all,
rather simple if you know how to do it :slight_smile:

With the help of a small example:

The file is the following 3x3 floating point matrix:
10.000 20.000 30.000
20.000 40.000 50.000
30.000 50.000 60.000

Here is the binary and decimal version:
THE HEADER PART:
binary: 00000100 00000000 00000000 00000000
decimal: 4 0 0 0

00010001 00000000 00000000 00000000
17 0 0 0

00100100 00000000 00000000 00000000
36 0 0 0

00110111 00000000 00000000 00000000
55 0 0 0

01001010 00110001
74 49

COMMENTS:
4 could be the number of bytes used to define the address of each row, but I
would not bet about that.
17 is the adress in bytes after which the first row of data starts,
36 is the same for the second row and 55 for the third. 74 is the end of file.
I noticed the last byte only now, I will try to understand it later.

After that, the rows of data come and since we know the offset between the
rows, we know exactly how many bytes to send through the zlib decompressor.

Any comment or hint is very appreciated.

The above is incorrect.

The header is a single byte, equal to sizeof(long) (typically 4 on a
32-bit platform, 8 on a 64-bit platform). Then, NROWS+1 offsets are
written as longs (i.e. 4 or 8 bytes, depending upon platform) in
big-endian (Motorola) byte order.

Thus, your example is actually interpreted as:

  4 sizeof(long)
  0 0 0 17 offset of row 0
  0 0 0 36 offset of row 1
  0 0 0 55 offset of row 2
  0 0 0 74 offset of end of data

See G__write_row_ptrs() in src/libes/gis/format.c for the code which
writes this data. However, note that the row offsets are initially
zero; they get overwritten later (if you're writing compressed data,
you don't know how much space it will require until you've compressed
it).

As for the format of the actual row data, see put_fp_data() in
src/libes/gis/put_row.c and RFC 1014 (the XDR specification).

--
Glynn Clements <glynn.clements@virgin.net>

Thank you very much Glynn,
In fact I'm having trouble with my vision of the things. :frowning:
You are saving me lot's of time.

Cheers
Andrea

On Wednesday 14 April 2004 08:17, you wrote:

Andrea Antonello wrote:
> after a good bottle of Montpulciano D'abruzzo and seven hours (9pm
> Saturday to 4am Sunday) in the pre-eastern night, with the help of my
> good friend "el Pitero", the amazing Khexedit, the Grass sources and the
> r.compress command, we finally have the binary raster format of floating
> point maps in Grass. I'ld like to share that, since apart of the bottle
> of good wine it was a heavy experience and none should repeat it:). I
> hope somebody can find usefull what follows and give me hints or make me
> notice possible errors.
>
> Alright, let's get started:
> My explanation example is a 3x3 floating point file.
> At the begin of the file we have 4 bytes, These are the same for
> compressed, uncompressed, integer or floating, so I left them aside for
> now. After these 4 bytes, other three pairs of 4 bytes follow, which
> contain the address of each row of data in the file (this also tells us,
> that there is a limit in the dimension of raster files, even if huge).
> After these bytes the rows of data start. Each row can be simply passed
> through a zlib decompressor. After that action we have the row of data in
> uncompressed format. The only thing left to do is to get out of that
> stream the right dataformat. That's all, rather simple if you know how to
> do it :slight_smile:
>
> With the help of a small example:
>
> The file is the following 3x3 floating point matrix:
> 10.000 20.000 30.000
> 20.000 40.000 50.000
> 30.000 50.000 60.000
>
> Here is the binary and decimal version:
> THE HEADER PART:
> binary: 00000100 00000000 00000000 00000000
> decimal: 4 0 0 0
>
> 00010001 00000000 00000000 00000000
> 17 0 0 0
>
> 00100100 00000000 00000000 00000000
> 36 0 0 0
>
> 00110111 00000000 00000000 00000000
> 55 0 0 0
>
> 01001010 00110001
> 74 49
>
> COMMENTS:
> 4 could be the number of bytes used to define the address of each row,
> but I would not bet about that.
> 17 is the adress in bytes after which the first row of data starts,
> 36 is the same for the second row and 55 for the third. 74 is the end of
> file. I noticed the last byte only now, I will try to understand it
> later.
>
> After that, the rows of data come and since we know the offset between
> the rows, we know exactly how many bytes to send through the zlib
> decompressor.
>
>
> Any comment or hint is very appreciated.

The above is incorrect.

The header is a single byte, equal to sizeof(long) (typically 4 on a
32-bit platform, 8 on a 64-bit platform). Then, NROWS+1 offsets are
written as longs (i.e. 4 or 8 bytes, depending upon platform) in
big-endian (Motorola) byte order.

Thus, your example is actually interpreted as:

  4 sizeof(long)
  0 0 0 17 offset of row 0
  0 0 0 36 offset of row 1
  0 0 0 55 offset of row 2
  0 0 0 74 offset of end of data

See G__write_row_ptrs() in src/libes/gis/format.c for the code which
writes this data. However, note that the row offsets are initially
zero; they get overwritten later (if you're writing compressed data,
you don't know how much space it will require until you've compressed
it).

As for the format of the actual row data, see put_fp_data() in
src/libes/gis/put_row.c and RFC 1014 (the XDR specification).

--
____________________________________________________________________________

University of Trento
Department of Civil and Environmental Engineering
Via Mesiano, 77 - Trento (ITALY)

Andrea Antonello
tel: +393288497722
fax: +390461882672
____________________________________________________________________________

On Wed, Apr 14, 2004 at 07:17:22AM +0100, Glynn Clements wrote:

Andrea Antonello wrote:

> after a good bottle of Montpulciano D'abruzzo and seven hours (9pm Saturday to
> 4am Sunday) in the pre-eastern night, with the help of my good friend "el
> Pitero", the amazing Khexedit, the Grass sources and the r.compress command,
> we finally have the binary raster format of floating point maps in Grass.
> I'ld like to share that, since apart of the bottle of good wine it was a
> heavy experience and none should repeat it:). I hope somebody can find
> usefull what follows and give me hints or make me notice possible errors.
>
> Alright, let's get started:
> My explanation example is a 3x3 floating point file.
> At the begin of the file we have 4 bytes, These are the same for compressed,
> uncompressed, integer or floating, so I left them aside for now. After these
> 4 bytes, other three pairs of 4 bytes follow, which contain the address of
> each row of data in the file (this also tells us, that there is a limit in
> the dimension of raster files, even if huge). After these bytes the rows of
> data start. Each row can be simply passed through a zlib decompressor. After
> that action we have the row of data in uncompressed format. The only thing
> left to do is to get out of that stream the right dataformat. That's all,
> rather simple if you know how to do it :slight_smile:
>
> With the help of a small example:
>
> The file is the following 3x3 floating point matrix:
> 10.000 20.000 30.000
> 20.000 40.000 50.000
> 30.000 50.000 60.000
>
> Here is the binary and decimal version:
> THE HEADER PART:
> binary: 00000100 00000000 00000000 00000000
> decimal: 4 0 0 0
>
> 00010001 00000000 00000000 00000000
> 17 0 0 0
>
> 00100100 00000000 00000000 00000000
> 36 0 0 0
>
> 00110111 00000000 00000000 00000000
> 55 0 0 0
>
> 01001010 00110001
> 74 49
>
> COMMENTS:
> 4 could be the number of bytes used to define the address of each row, but I
> would not bet about that.
> 17 is the adress in bytes after which the first row of data starts,
> 36 is the same for the second row and 55 for the third. 74 is the end of file.
> I noticed the last byte only now, I will try to understand it later.
>
> After that, the rows of data come and since we know the offset between the
> rows, we know exactly how many bytes to send through the zlib decompressor.
>
>
> Any comment or hint is very appreciated.

The above is incorrect.

The header is a single byte, equal to sizeof(long) (typically 4 on a
32-bit platform, 8 on a 64-bit platform). Then, NROWS+1 offsets are
written as longs (i.e. 4 or 8 bytes, depending upon platform) in
big-endian (Motorola) byte order.

Thus, your example is actually interpreted as:

  4 sizeof(long)
  0 0 0 17 offset of row 0
  0 0 0 36 offset of row 1
  0 0 0 55 offset of row 2
  0 0 0 74 offset of end of data

See G__write_row_ptrs() in src/libes/gis/format.c for the code which
writes this data. However, note that the row offsets are initially
zero; they get overwritten later (if you're writing compressed data,
you don't know how much space it will require until you've compressed
it).

As for the format of the actual row data, see put_fp_data() in
src/libes/gis/put_row.c and RFC 1014 (the XDR specification).

I have taken the liberty to write these
comments into
src/libes/gis/format.c

Please update there if needed.

Note: The XDR RFC 1014 is found at:
http://www.faqs.org/rfcs/rfc1014.html

Markus

Two doubts:
1) Since the offsets in the header referr to the compressed data (and the
binary file doesn't carry the information of the datatype), I won't be able
to understand just from the binary file itself if I have floating point or
double data. Is that right? The cellhd file just says -1, which is float or
double.
Is the only way that to uncompress a row and divide the number of bytes per
number of cols? Any hint?

2) Shouldn't the default be float? My GRASS_FP_DOUBLE variable is not set, but
the r.in.ascii creates double type maps.

Ciao
Andrea

> The above is incorrect.
>
> The header is a single byte, equal to sizeof(long) (typically 4 on a
> 32-bit platform, 8 on a 64-bit platform). Then, NROWS+1 offsets are
> written as longs (i.e. 4 or 8 bytes, depending upon platform) in
> big-endian (Motorola) byte order.
>
> Thus, your example is actually interpreted as:
>
> 4 sizeof(long)
> 0 0 0 17 offset of row 0
> 0 0 0 36 offset of row 1
> 0 0 0 55 offset of row 2
> 0 0 0 74 offset of end of data
>
> See G__write_row_ptrs() in src/libes/gis/format.c for the code which
> writes this data. However, note that the row offsets are initially
> zero; they get overwritten later (if you're writing compressed data,
> you don't know how much space it will require until you've compressed
> it).
>
> As for the format of the actual row data, see put_fp_data() in
> src/libes/gis/put_row.c and RFC 1014 (the XDR specification).

I have taken the liberty to write these
comments into
src/libes/gis/format.c

Please update there if needed.

Note: The XDR RFC 1014 is found at:
http://www.faqs.org/rfcs/rfc1014.html

Markus

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

--
____________________________________________________________________________
"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik

University of Trento
Department of Civil and Environmental Engineering / CUDAM
Via Mesiano, 77 - Trento (ITALY)

Andrea Antonello
tel: +393288497722
fax: +390461882672
email:antonell@ing.unitn.it
____________________________________________________________________________

Two doubts:
1) Since the offsets in the header referr to the compressed data (and the
binary file doesn't carry the information of the datatype), I won't be able
to understand just from the binary file itself if I have floating point or
double data. Is that right? The cellhd file just says -1, which is float or
double.
Is the only way that to uncompress a row and divide the number of bytes per
number of cols? Any hint?

2) Shouldn't the default be float? My GRASS_FP_DOUBLE variable is not set, but
the r.in.ascii creates double type maps.

Ciao
Andrea

> The above is incorrect.
>
> The header is a single byte, equal to sizeof(long) (typically 4 on a
> 32-bit platform, 8 on a 64-bit platform). Then, NROWS+1 offsets are
> written as longs (i.e. 4 or 8 bytes, depending upon platform) in
> big-endian (Motorola) byte order.
>
> Thus, your example is actually interpreted as:
>
> 4 sizeof(long)
> 0 0 0 17 offset of row 0
> 0 0 0 36 offset of row 1
> 0 0 0 55 offset of row 2
> 0 0 0 74 offset of end of data
>
> See G__write_row_ptrs() in src/libes/gis/format.c for the code which
> writes this data. However, note that the row offsets are initially
> zero; they get overwritten later (if you're writing compressed data,
> you don't know how much space it will require until you've compressed
> it).
>
> As for the format of the actual row data, see put_fp_data() in
> src/libes/gis/put_row.c and RFC 1014 (the XDR specification).

I have taken the liberty to write these
comments into
src/libes/gis/format.c

Please update there if needed.

Note: The XDR RFC 1014 is found at:
http://www.faqs.org/rfcs/rfc1014.html

Markus

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

--
____________________________________________________________________________
"Let it be as much a great honour to take as to give learning,
if you want to be called wise."
Skuggsja' - The King's mirror - 1240 Reykjavik

University of Trento
Department of Civil and Environmental Engineering
Via Mesiano, 77 - Trento (ITALY)

Andrea Antonello
tel: +393288497722
fax: +390461882672
____________________________________________________________________________

Andrea Antonello wrote:

1) Since the offsets in the header referr to the compressed data (and the
binary file doesn't carry the information of the datatype), I won't be able
to understand just from the binary file itself if I have floating point or
double data. Is that right?

Yes.

The cellhd file just says -1, which is float or
double.
Is the only way that to uncompress a row and divide the number of bytes per
number of cols? Any hint?

The official mechanism is to read the "type:" entry from the
cell_misc/<map_name>/f_format file (see G__check_fp_type()).

2) Shouldn't the default be float? My GRASS_FP_DOUBLE variable is not set, but
the r.in.ascii creates double type maps.

The GRASS_FP_DOUBLE variable is only used if the program calls
G_open_fp_cell_new() or G_open_fp_cell_new_uncompressed() directly,
without calling G_set_fp_type(). Most programs use
G_open_raster_new(), and thus specify the exact type of map to be
created.

--
Glynn Clements <glynn.clements@virgin.net>