[GRASSLIST:4565] different CELL integer types?

I've been importing a bunch of SRTM data recently and noticed that there seems to be (at least) 2 different integer cell types - 16 bit and 32 bit. As imported, one tile takes up a maximum of 2.7MB in the CELL file, same as the source SRTM data file, and in the GRASS header file it's type is 1. After I run it thru fillnulls, I round that back to integers with r.mapcalc, and the CELL file sizes double, and the GRASS header has a type of 3.

So, is there a way to get these back to the 16 bit cell types? Since these are meter elevations on the earth, I'm not worried about them ever going beyond +/-32767. Normally I wouldn't worry about it, but I've converted many GB of these SRTM tiles now and saving a few GB would be nice.

-----
William Kyngesburye <kyngchaos@charter.net>
http://webpages.charter.net/kyngchaos/

Theory of the Universe

There is a theory which states that if ever anyone discovers exactly what the universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarrely inexplicable. There is another theory which states that this has already happened.

-Hitchhiker's Guide to the Galaxy 2nd season intro

A little more info:

I've been importing a bunch of SRTM data recently and noticed that there seems to be (at least) 2 different integer cell types - 16 bit and 32 bit. As imported, one tile takes up a maximum of 2.7MB in the CELL file, same as the source SRTM data file, and in the GRASS header file it's type is 1.

I mean, the 'format:' line in the header file.

After I run it thru fillnulls, I round that back to integers with r.mapcalc, and the CELL file sizes double, and the GRASS header has a type of 3.

At least many end up like this. I looked closer at a few, and it seems that all the tiles with some negative values switch to the larger integer size. If the whole tile is positive, it stays as 16 bit. Are grass 16 bit ints not signed, so grass has to use 32 bit ints? I can mapcalc them to be all positive and they do switch back to 16 bit (or 8 bit for those low, flat tiles).

Also, with a script importing a bunch and doing fillnulls I only remember how the first couple I looked at imported - as 16 bit, I guess I got lucky on those because tiles with negative cells really do import as 32 bit.

So, is there a way to get these back to the 16 bit cell types? Since these are meter elevations on the earth, I'm not worried about them ever going beyond +/-32767. Normally I wouldn't worry about it, but I've converted many GB of these SRTM tiles now and saving a few GB would be nice.

-----
William Kyngesburye <kyngchaos@charter.net>
http://webpages.charter.net/kyngchaos/

"I ache, therefore I am. Or in my case - I am, therefore I ache."

- Marvin

William K wrote:

A little more info:

> I've been importing a bunch of SRTM data recently and noticed that
> there seems to be (at least) 2 different integer cell types - 16 bit
> and 32 bit. As imported, one tile takes up a maximum of 2.7MB in the
> CELL file, same as the source SRTM data file, and in the GRASS header
> file it's type is 1.

I mean, the 'format:' line in the header file.

> After I run it thru fillnulls, I round that back to integers with
> r.mapcalc, and the CELL file sizes double, and the GRASS header has a
> type of 3.
>
At least many end up like this. I looked closer at a few, and it seems
that all the tiles with some negative values switch to the larger
integer size. If the whole tile is positive, it stays as 16 bit. Are
grass 16 bit ints not signed, so grass has to use 32 bit ints? I can
mapcalc them to be all positive and they do switch back to 16 bit (or 8
bit for those low, flat tiles).

Compressed integer rasters use 1, 2, 3 or 4 bytes, depending upon how
many are needed. However, as you note, negative values always require
4 bytes.

Essentially, the stream of ints is first converted to the external
representation, which is 4 bytes, big-endian, with the topmost bit of
the first byte being a sign bit.

If the raster is uncompressed, the stream of bytes is then written out
to the file.

OTOH, if the raster is compressed, the next step is to determine the
number of bytes required. If the first 1, 2 or 3 bytes of every cell
in a given row are always zero, those bytes will be discarded. As
negative values always have the topmost bit of the first byte set,
they will always require 4 bytes. Consequently, any rows which include
at least one negative value will be written out using 4 bytes per cell
(although the actual data will still be subject to RLE compression).

Unfortunately, this can't be changed without breaking existing raster
maps.

--
Glynn Clements <glynn.clements@virgin.net>

Disappointing. I understand not being able to change the existing format, but what about adding 1 and 2 byte signed cell formats? When dealing with elevation data this can save everyone a lot of storage space (and probably speed up disk access some).

On Oct 20, 2004, at 10:28 AM, Glynn Clements wrote:

William K wrote:

A little more info:

I've been importing a bunch of SRTM data recently and noticed that
there seems to be (at least) 2 different integer cell types - 16 bit
and 32 bit. As imported, one tile takes up a maximum of 2.7MB in the
CELL file, same as the source SRTM data file, and in the GRASS header
file it's type is 1.

I mean, the 'format:' line in the header file.

After I run it thru fillnulls, I round that back to integers with
r.mapcalc, and the CELL file sizes double, and the GRASS header has a
type of 3.

At least many end up like this. I looked closer at a few, and it seems
that all the tiles with some negative values switch to the larger
integer size. If the whole tile is positive, it stays as 16 bit. Are
grass 16 bit ints not signed, so grass has to use 32 bit ints? I can
mapcalc them to be all positive and they do switch back to 16 bit (or 8
bit for those low, flat tiles).

Compressed integer rasters use 1, 2, 3 or 4 bytes, depending upon how
many are needed. However, as you note, negative values always require
4 bytes.

Essentially, the stream of ints is first converted to the external
representation, which is 4 bytes, big-endian, with the topmost bit of
the first byte being a sign bit.

If the raster is uncompressed, the stream of bytes is then written out
to the file.

OTOH, if the raster is compressed, the next step is to determine the
number of bytes required. If the first 1, 2 or 3 bytes of every cell
in a given row are always zero, those bytes will be discarded. As
negative values always have the topmost bit of the first byte set,
they will always require 4 bytes. Consequently, any rows which include
at least one negative value will be written out using 4 bytes per cell
(although the actual data will still be subject to RLE compression).

Unfortunately, this can't be changed without breaking existing raster
maps.

--
Glynn Clements <glynn.clements@virgin.net>

-----
William Kyngesburye <kyngchaos@charter.net>
http://webpages.charter.net/kyngchaos/

First Pogril: Why is life like sticking your head in a bucket filled with hyena offal?
Second Pogril: I don't know. Why IS life like sticking your head in a bucket filled with hyena offal?
First Pogril: I don't know either. Wretched, isn't it?

-HitchHiker's Guide to the Galaxy

William K wrote:

Disappointing. I understand not being able to change the existing
format, but what about adding 1 and 2 byte signed cell formats? When
dealing with elevation data this can save everyone a lot of storage
space (and probably speed up disk access some).

That's one of the objectives for my planned next-generation raster I/O
system. But that won't be happening soon.

For the time being, in the most recent 5.3 CVS version, you may be
able to reduce disk space further by setting the environment variable
GRASS_INT_ZLIB (to anything). That will cause newly-created integer
raster maps to use zlib (gzip) compression rather than RLE
compression. However, such maps won't be readable with older versions
of GRASS.

Alternatively, you could shift the elevations so that the stored
values are never negative, and create a reclass map to correct the
elevations.

--
Glynn Clements <glynn.clements@virgin.net>