[GRASS-dev] [GRASS GIS] #2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
At time, integer maps (CELL) are still compressed with RLE
This leads to a huge waste of disk space when it comes to large
data.

Proposal: make ZLIB, level 3 the standard compression.

At time we can enable the environment variable GRASS_INT_ZLIB
but it will use the default ZLIB level 6 compression which
is too CPU intensive. So a (user) control over this is important.

BTW: Manual of r.compress updated in r60814, needs to be backported.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

default ZLIB compression (usually 6), is hardcoded in:

lib/gis/flate.c, line 330

However, the used compression is yet RLE.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2349 neteler]:
> At time, integer maps (CELL) are still compressed with RLE
> This leads to a huge waste of disk space when it comes to large
> data.
>
> Proposal: make ZLIB, level 3 the standard compression.

Is GRASS_INT_ZLIB support now old enough that it can be taken for granted?

> At time we can enable the environment variable GRASS_INT_ZLIB
> but it will use the default ZLIB level 6 compression which
> is too CPU intensive. So a (user) control over this is important.

The current behaviour is that setting GRASS_INT_ZLIB to anything (even an
empty string) will enable zlib compression at the hard-coded level. One
option is to parse the value as an integer and use the result as the
compression level. However, it's possible that people are currently using
e.g. GRASS_INT_ZLIB=1 to enable it with the existing default level.

Another option is to add another environment variable for the level.

Aside: if there are still systems out there using the historical limit of
4096 bytes of memory for the combination of environment variables and
arguments (argv), we might want to think about making GRASS less greedy
with respect to environment variables.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

Replying to [comment:2 glynn]:
> Replying to [ticket:2349 neteler]:
> > At time, integer maps (CELL) are still compressed with RLE
> > This leads to a huge waste of disk space when it comes to large
> > data.
> >
> > Proposal: make ZLIB, level 3 the standard compression.
>
> Is GRASS_INT_ZLIB support now old enough that it can be taken for
granted?

I hope yes. I am not aware of negative reports.

> > At time we can enable the environment variable GRASS_INT_ZLIB
> > but it will use the default ZLIB level 6 compression which
> > is too CPU intensive. So a (user) control over this is important.
>
> The current behaviour is that setting GRASS_INT_ZLIB to anything (even
an empty string) will enable zlib compression at the hard-coded level.

Exactly.

> One option is to parse the value as an integer and use the result as the
compression level. However, it's possible that people are currently using
e.g. GRASS_INT_ZLIB=1 to enable it with the existing default level.
>
> Another option is to add another environment variable for the level.

Yes, a new GRASS_ZLIBLEVEL may be less invasive.

> Aside: if there are still systems out there using the historical limit
of 4096 bytes of memory for the combination of environment variables and
arguments (argv), we might want to think about making GRASS less greedy
with respect to environment variables.

You mean the number and/or the length?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [comment:3 neteler]:

> > Aside: if there are still systems out there using the historical limit
of 4096 bytes of memory for the combination of environment variables and
arguments (argv), we might want to think about making GRASS less greedy
with respect to environment variables.
>
> You mean the number and/or the length?

Mainly the number.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2349 neteler]:

> Proposal: make ZLIB, level 3 the standard compression.

r61380 implements the following behaviour:

  * zlib compression is the default. Set GRASS_INT_ZLIB=0 to use RLE
compression.

  * The compression level can be set via GRASS_ZLIB_LEVEL, whose value
should be an integer between 0 and 9. If not set (or if the value cannot
be parsed as an integer), zlib's default compression level will be used
(lib/gis/flate.c:333, if a different default is preferred).

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

Putting here to not forget about: raster/r.compress/r.compress.html needs
to be updated

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [comment:6 neteler]:
> raster/r.compress/r.compress.html needs to be updated

Done in r61500.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: compression | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * keywords: => compression

Comment:

In case of a backport to relbr7, this should be the needed changes:

r61380 + r61420 + r61422 + r61500

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: compression | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * priority: critical => blocker

Comment:

Any objections to backport?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:9&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: compression | Platform: All
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

Replying to [comment:5 glynn]:
> * The compression level can be set via GRASS_ZLIB_LEVEL, whose value
should be an integer between 0 and 9. If not set (or if the value cannot
be parsed as an integer), zlib's default compression level will be used
(lib/gis/flate.c:333, if a different default is preferred).

In relbranch7 there is currently:

Z_DEFAULT_COMPRESSION = 1 - "gives the best compromise between speed and
compression" as per r61424.

Perhaps zlib compression 1 should be adopted for trunk?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:10&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------------------+--------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: compression, r.compress | Platform: All
      Cpu: Unspecified |
-------------------------------------+--------------------------------------
Changes (by neteler):

  * keywords: compression => compression, r.compress

Comment:

Replying to [comment:8 neteler]:
> In case of a backport to relbr7, this should be the needed changes:
>
> r61380 + r61420 + r61422 + r61500

Backported to relbr7 in r61797.

Remains open which default ZLIB level should be used, see comment:10

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:11&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------------------+--------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: compression, r.compress | Platform: All
      Cpu: Unspecified |
-------------------------------------+--------------------------------------

Comment(by neteler):

It seems that the NULL file is not compressed (file "cell_misc/null").
According to

http://lists.osgeo.org/pipermail/grass-user/2010-January/054216.html

it is one bit (null/non-null) for each cell. It looks like this:

{{{
[neteler@giscluster modis_lst_reconstructed_europe_daily]$ hexdump
cell_misc/lst_2002_196_average/null | head
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000ad0 ffff ffff ffff ffe0 ffff ffff ffff ffff
0000ae0 ffff ffff ffff ffff ffff ffff ffff ffff
*
00015a0 ffff ffff ffff ffff ffff ffff e0ff ffff
00015b0 ffff ffff ffff ffff ffff ffff ffff ffff
*
0002080 ffff ffff ffe0 ffff ffff ffff ffff ffff
0002090 ffff ffff ffff ffff ffff ffff ffff ffff
...
}}}

For our 12829 daily min/average/max MODIS LST maps covering Europe, the
null files
consume a lot of space:

{{{
[neteler@giscluster cell_misc]$ du -hs .
621G .
}}}

Question: Would it be possible to compress also the null files, even with
just a
weak compression?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:12&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------------------+--------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Raster | Version: svn-releasebranch70
Keywords: compression, r.compress | Platform: All
      Cpu: Unspecified |
-------------------------------------+--------------------------------------

Comment(by glynn):

Replying to [comment:12 neteler]:

> It seems that the NULL file is not compressed (file "cell_misc/null").
According to
>
> http://lists.osgeo.org/pipermail/grass-user/2010-January/054216.html
>
> it is one bit (null/non-null) for each cell.

> Question: Would it be possible to compress also the null files, even
with just a
> weak compression?

Being uncompressed, the null files don't contain an index. The offset to
the beginning of a given row is obtained by multiplying the row number by
the number of bytes per row (which is just the number of columns divided
by 8, rounded upwards).

The main issue is likely to be the need to support both formats. We need
to

  * Be able to read and write the uncompressed format, for compatibility
with existing versions of GRASS.
  * Be able to distinguish between compressed and uncompressed formats on
read.
  * Provide some mechanism (i.e. yet another environment variable) to
indicate which format to use on write.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:13&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------------------+--------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 8.0.0
Component: Raster | Version: svn-trunk
Keywords: compression, r.compress | Platform: All
      Cpu: Unspecified |
-------------------------------------+--------------------------------------
Changes (by neteler):

  * priority: blocker => critical
  * version: svn-releasebranch70 => svn-trunk
  * milestone: 7.0.0 => 8.0.0

Comment:

Replying to [comment:13 glynn]:
...
> The main issue is likely to be the need to support both formats. We need
to

... at this point raster format changes may go into GRASS GIS 8.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2349#comment:14&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
-------------------------------------------+--------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: critical | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Keywords: compression, r.compress, null | Platform: All
      Cpu: Unspecified |
-------------------------------------------+--------------------------------
Changes (by neteler):

  * keywords: compression, r.compress => compression, r.compress, null
  * milestone: 8.0.0 => 7.1.0

Comment:

back to this topic: Here on our system I found > 1.7TB of NULL files in a
single location, all
uncompressed.

What about having a "null2" file which is compressed and with index. If
present, fine, otherwise use the uncompressed well known null file format?

For backward compatibility, r.null could extended to convert from
compressed null2 to
uncompressed null (similar to v.build for the new spatial index in G7).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349#comment:15&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
--------------------------+-------------------------------------------
  Reporter: neteler | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: critical | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: compression, r.compress, null
       CPU: Unspecified | Platform: All
--------------------------+-------------------------------------------

Comment (by glynn):

Replying to [comment:15 neteler]:
> back to this topic: Here on our system I found > 1.7TB of NULL files in
a single location, all
> uncompressed.

How large are the null files compared to the cell/fcell files?

> What about having a "null2" file which is compressed and with index. If
present, fine, otherwise use the uncompressed well known null file format?

That's probably not a great deal of work, but as with any such change, we
need to consider the migration strategy. If we just start creating
compressed null files, mapsets will cease to be usable with older
versions.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349#comment:16&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
--------------------------+-------------------------------------------
  Reporter: neteler | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: critical | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: compression, r.compress, null
       CPU: Unspecified | Platform: All
--------------------------+-------------------------------------------

Comment (by neteler):

Replying to [comment:16 glynn]:
> How large are the null files compared to the cell/fcell files?

  * With MODIS LST data, the null files are between 1.7 and 7.6 times
larger than the cell files (we store the LST maps in deg C * 100 as
integer to save disk space).
  * With 100k random points, the null file is 7.1 times larger than the
fcell map
  * With the EU 25m DEM, the null file is way smaller that the derived
aspect map (17% of the DEM fcell file)

> > What about having a "null2" file which is compressed and with index.
If present, fine, otherwise use the uncompressed well known null file
format?
>
> That's probably not a great deal of work, but as with any such change,
we need to consider the migration strategy. If we just start creating
compressed null files, mapsets will cease to be usable with older
versions.

Right but this could be covered with an addon/new script in G6 and earlier
(as v.build does for vector data).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349#comment:17&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
--------------------------+-------------------------------------------
  Reporter: neteler | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: critical | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: compression, r.compress, null
       CPU: Unspecified | Platform: All
--------------------------+-------------------------------------------
Changes (by glynn):

* Attachment "compressed_nulls.diff" added.

implement compressed nulls

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349&gt;
GRASS GIS <http://grass.osgeo.org>

#2349: CELL raster format: make ZLIB level 3 standard compression instead of RLE
--------------------------+-------------------------------------------
  Reporter: neteler | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: critical | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: compression, r.compress, null
       CPU: Unspecified | Platform: All
--------------------------+-------------------------------------------

Comment (by glynn):

Replying to [comment:15 neteler]:

> What about having a "null2" file which is compressed and with index. If
present, fine, otherwise use the uncompressed well known null file format?

Please test attachment:compressed_nulls.diff

Note that r.null and r.support will also require some changes, as they
assume that the file consists of nothing but the null data (i.e. no
index).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2349#comment:18&gt;
GRASS GIS <http://grass.osgeo.org>