[GRASS-dev] Raster data compression confusion: identical CELL file size

Hi,

since my local drive was filled up again :slight_smile: I checked raster how data are currently compressed in GRASS GIS 7.2.svn.
According to

https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms

all maps are DEFLATE compressed by default:

"Raster maps are by default ZLIB compressed.
…

Floating point (FCELL, DCELL) raster maps never use RLE compression; they are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
"

Ehm, now how are FCELL, DCELL compressed by default? Not quite clear to me! This document needs improvements.

Reality check with Sentinel-2 data (3 different bands, same regional extent):

GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
min=0
max=22937
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
min=0
max=18849
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
min=0
max=17210

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
<s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
<s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
<s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
CELL

So far so nice. Now the suprising part, while the channels are not identical (obviously, since covering different spectral parts), the map sizes are identical!

GRASS 7.2.svn (utm37n):~ > ls -la
…
-rw-r–r-- 1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
-rw-r–r-- 1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
-rw-r–r-- 1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:36 s2_20151225_B05_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:37 s2_20151225_B06_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:39 s2_20151225_B07_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:40 s2_20151225_B11_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:42 s2_20151225_B12_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:43 s2_20151225_B8A_20m

I would expect different sizes, compression can hardly lead to identical file sizes.

Next test: gzip the file

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m
-rw-r–r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m.gz
-rw-r–r-- 1 mneteler mneteler 1456248453 Jun 14 17:19 s2_20151225_B03_10m.gz

R

1456248453/2539235691
[1] 0.5734987

Quite smaller! So I am not at all convinced that these CELL files are currently ZLIB compressed.

From this ticket I would expect something else:
https://trac.osgeo.org/grass/ticket/2349

Ah, and no specific environment variables are set:

GRASS 7.2.svn (utm37n):~ > echo $GRASS_
$GRASS_ADDON_BASE $GRASS_GNUPLOT $GRASS_HTML_BROWSER $GRASS_PAGER
$GRASS_PROJSHARE $GRASS_PYTHON $GRASS_VERSION

A bug?

Markus

Hi,

lower level inconsistency is a no-go.
This should be fixed soonest on all 7.x versions IMHO.

Besides, reading the ticket 2349 mentioned, it is remarkable that NULL map are never compressed by default.

yann

···

On 6 September 2016 at 13:44, Markus Neteler <neteler@osgeo.org> wrote:

Hi,

since my local drive was filled up again :slight_smile: I checked raster how data are currently compressed in GRASS GIS 7.2.svn.
According to

https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms

all maps are DEFLATE compressed by default:

"Raster maps are by default ZLIB compressed.
…

Floating point (FCELL, DCELL) raster maps never use RLE compression; they are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
"

Ehm, now how are FCELL, DCELL compressed by default? Not quite clear to me! This document needs improvements.

Reality check with Sentinel-2 data (3 different bands, same regional extent):

GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
min=0
max=22937
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
min=0
max=18849
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
min=0
max=17210

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
<s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
<s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
<s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
CELL

So far so nice. Now the suprising part, while the channels are not identical (obviously, since covering different spectral parts), the map sizes are identical!

GRASS 7.2.svn (utm37n):~ > ls -la
…
-rw-r–r-- 1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
-rw-r–r-- 1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
-rw-r–r-- 1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:36 s2_20151225_B05_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:37 s2_20151225_B06_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:39 s2_20151225_B07_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:40 s2_20151225_B11_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:42 s2_20151225_B12_20m
-rw-r–r-- 1 neteler neteler 634878630 Jun 14 20:43 s2_20151225_B8A_20m

I would expect different sizes, compression can hardly lead to identical file sizes.

Next test: gzip the file

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m
-rw-r–r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m.gz
-rw-r–r-- 1 mneteler mneteler 1456248453 Jun 14 17:19 s2_20151225_B03_10m.gz

R

1456248453/2539235691
[1] 0.5734987

Quite smaller! So I am not at all convinced that these CELL files are currently ZLIB compressed.

From this ticket I would expect something else:
https://trac.osgeo.org/grass/ticket/2349

Ah, and no specific environment variables are set:

GRASS 7.2.svn (utm37n):~ > echo $GRASS_
$GRASS_ADDON_BASE $GRASS_GNUPLOT $GRASS_HTML_BROWSER $GRASS_PAGER
$GRASS_PROJSHARE $GRASS_PYTHON $GRASS_VERSION

A bug?

Markus


grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

–

Yann Chemin
Skype/FB: yann.chemin

On Tue, Sep 6, 2016 at 1:44 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi,

since my local drive was filled up again :slight_smile: I checked raster how data are
currently compressed in GRASS GIS 7.2.svn.
According to

https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms

all maps are DEFLATE compressed by default:

"Raster maps are by default ZLIB compressed.
...

Floating point (FCELL, DCELL) raster maps never use RLE compression; they
are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
"

Ehm, now how *are* FCELL, DCELL compressed by default? Not quite clear to
me! This document needs improvements.

The manual says, as you cited:
"Raster maps are by default ZLIB compressed."
What exactly is unclear about this? Should it say "All raster maps ..." ?

Reality check with Sentinel-2 data (3 different bands, same regional
extent):

GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
min=0
max=22937
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
min=0
max=18849
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
min=0
max=17210

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
<s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
<s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
<s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
CELL

So far so nice. Now the suprising part, while the channels are not identical
(obviously, since covering different spectral parts), the map sizes are
identical!

GRASS 7.2.svn (utm37n):~ > ls -la
...
-rw-r--r-- 1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
-rw-r--r-- 1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
-rw-r--r-- 1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
-rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:36 s2_20151225_B05_20m
-rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:37 s2_20151225_B06_20m
-rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:39 s2_20151225_B07_20m
-rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:40 s2_20151225_B11_20m
-rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:42 s2_20151225_B12_20m
-rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:43 s2_20151225_B8A_20m

I would expect different sizes, compression can hardly lead to identical
file sizes.

The default ZLIB compression level was invalid, causing ZLIB to not
compress at all. Fixed in r69387,8.

Markus M

Next test: gzip the file

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
s2_20151225_B03_10m
-rw-r--r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip
s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
s2_20151225_B03_10m.gz
-rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19
s2_20151225_B03_10m.gz

R

1456248453/2539235691

[1] 0.5734987

Quite smaller! So I am not at all convinced that these CELL files are
currently ZLIB compressed.

Compressing a whole file instead of compressing each row separately
(GRASS reads and writes raster data row by row) can lead to higher
compression ratios.

From this ticket I would expect something else:
https://trac.osgeo.org/grass/ticket/2349

Ah, and no specific environment variables are set:

GRASS 7.2.svn (utm37n):~ > echo $GRASS_<tab>
$GRASS_ADDON_BASE $GRASS_GNUPLOT $GRASS_HTML_BROWSER $GRASS_PAGER
$GRASS_PROJSHARE $GRASS_PYTHON $GRASS_VERSION

A bug?

Markus

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On 6 September 2016 at 15:23, Markus Metz <markus.metz.giswork@gmail.com>
wrote:

On Tue, Sep 6, 2016 at 1:44 PM, Markus Neteler <neteler@osgeo.org> wrote:
> Hi,
>
> since my local drive was filled up again :slight_smile: I checked raster how data are
> currently compressed in GRASS GIS 7.2.svn.
> According to
>
> https://grass.osgeo.org/grass72/manuals/r.compress.
html#used-compression-algorithms
>
> all maps are DEFLATE compressed by default:
>
> "Raster maps are by default ZLIB compressed.
> ...
>
> Floating point (FCELL, DCELL) raster maps never use RLE compression; they
> are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
> "
>
> Ehm, now how *are* FCELL, DCELL compressed by default? Not quite clear to
> me! This document needs improvements.

The manual says, as you cited:
"Raster maps are by default ZLIB compressed."
What exactly is unclear about this? Should it say "All raster maps ..." ?

>
> Reality check with Sentinel-2 data (3 different bands, same regional
> extent):
>
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
> min=0
> max=22937
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
> min=0
> max=18849
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
> min=0
> max=17210
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
> <s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
> <s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
> <s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> So far so nice. Now the suprising part, while the channels are not
identical
> (obviously, since covering different spectral parts), the map sizes are
> identical!
>
> GRASS 7.2.svn (utm37n):~ > ls -la
> ...
> -rw-r--r-- 1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
> -rw-r--r-- 1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
> -rw-r--r-- 1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
> -rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:36 s2_20151225_B05_20m
> -rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:37 s2_20151225_B06_20m
> -rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:39 s2_20151225_B07_20m
> -rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:40 s2_20151225_B11_20m
> -rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:42 s2_20151225_B12_20m
> -rw-r--r-- 1 neteler neteler 634878630 Jun 14 20:43 s2_20151225_B8A_20m
>
> I would expect different sizes, compression can hardly lead to identical
> file sizes.

The default ZLIB compression level was invalid, causing ZLIB to not
compress at all. Fixed in r69387,8.

Markus M

>
> Next test: gzip the file
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
> s2_20151225_B03_10m
> -rw-r--r-- 1 mneteler mneteler 2539235691 Jun 14 17:19
s2_20151225_B03_10m
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip
> s2_20151225_B03_10m
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
> s2_20151225_B03_10m.gz
> -rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19
> s2_20151225_B03_10m.gz
>
> R
>> 1456248453/2539235691
> [1] 0.5734987
>
> Quite smaller! So I am not at all convinced that these CELL files are
> currently ZLIB compressed.

Compressing a whole file instead of compressing each row separately
(GRASS reads and writes raster data row by row) can lead to higher
compression ratios.

This brings to a long time amount of discussions about parallelization
speed-ups of raster functions being limited by the row-based I/O of GRASS.
Maybe we should look into this for GRASS8...

>
> From this ticket I would expect something else:
> https://trac.osgeo.org/grass/ticket/2349
>
> Ah, and no specific environment variables are set:
>
> GRASS 7.2.svn (utm37n):~ > echo $GRASS_<tab>
> $GRASS_ADDON_BASE $GRASS_GNUPLOT $GRASS_HTML_BROWSER $GRASS_PAGER
> $GRASS_PROJSHARE $GRASS_PYTHON $GRASS_VERSION
>
>
> A bug?
>
> Markus
>
> _______________________________________________
> grass-dev mailing list
> grass-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

--
Yann Chemin
Skype/FB: yann.chemin

On 06/09/16 16:39, Yann Chemin wrote:

On 6 September 2016 at 15:23, Markus Metz <markus.metz.giswork@gmail.com
<mailto:markus.metz.giswork@gmail.com>> wrote:

    Compressing a whole file instead of compressing each row separately
    (GRASS reads and writes raster data row by row) can lead to higher
    compression ratios.

This brings to a long time amount of discussions about parallelization
speed-ups of raster functions being limited by the row-based I/O of
GRASS. Maybe we should look into this for GRASS8...

https://trac.osgeo.org/grass/wiki/Grass8Planning:

Raster library: Storage in tiles instead of by row

:slight_smile:

Moritz

On Tue, Sep 6, 2016 at 3:23 PM, Markus Metz <markus.metz.giswork@gmail.com> wrote:

On Tue, Sep 6, 2016 at 1:44 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi,

since my local drive was filled up again :slight_smile: I checked raster how data are
currently compressed in GRASS GIS 7.2.svn.
According to

https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms

…

The manual says, as you cited:
“Raster maps are by default ZLIB compressed.”
What exactly is unclear about this? Should it say “All raster maps …” ?

Well, it is perhaps ok but the software appeared to behave differently.

…

The default ZLIB compression level was invalid, causing ZLIB to not
compress at all. Fixed in r69387,8.

Ah! So, that changes it dramatically :slight_smile: Thanks for the quick fix:

-rw-r–r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m <<-- before
-rw-r–r-- 1 mneteler mneteler 1463080868 Sep 6 16:46 s2_20151225_B03_10m_NEWCOPY <<-- now, generated with r.mapcalc = operator

Great.
And comparing to the previously full file-based gzip test:

-rw-r–r-- 1 mneteler mneteler 1456248453 Jun 14 17:19 s2_20151225_B03_10m.gz

1463080868 / 1456248453
[1] 1.004692

… which is now almost the same compression rate.

Some more values:

  • before today’s bugfix:
    du -hs PERMANENT/
    30G PERMANENT/

  • after the bugfix (copies created with r.mapcalc, original raster maps removed):
    du -hs PERMANENT/
    25G PERMANENT/

  • using export GRASS_COMPRESS_NULLS=1 and running r.null -z on all raster maps
    which generates cell_misc/nullcmpr and removes the old uncompressed cell_misc/null:
    du -hs PERMANENT/
    21G PERMANENT/

Now a notable amount of (SSD) disk space is saved - 21GB usage instead of 30GB!

Goal of trac #2349 achieved.

Thanks again,

markusN

PS: it would be great to know through a user message is r.null -z is actually compressing or uncompressing…

–
Markus Neteler
http://www.mundialis.de - free data with free software
http://grass.osgeo.org
http://courses.neteler.org/blog