[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
---------------------------+-------------------------
Reporter: sprice | Owner: grass-dev@…
     Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Keywords: ZLIB LZ4 ZSTD | CPU: OSX/Intel
Platform: MacOSX |
---------------------------+-------------------------
I've added the ability for reading/writing raster rows in compression
formats LZ4, LZ4HC, & ZSTD (in addition to the existing RLE & ZLIB.) These
new algorithms are extremely fast (an order of magnitude faster than ZLIB)
and are a better fit for modern, fast, hard drives & SSDs.

I've attached a .tgz file with the necessary added & changed files. The
new algorithms can be enabled/disabled with environment vars the same way
as ZLIB, as described in the r.compress documentation:
https://grass.osgeo.org/grass70/manuals/r.compress.html

Algorithm summary:

LZ4 produces a slightly worse compression ratio than ZLIB, but it
compresses about an order of magnitude faster than ZLIB. It decompresses
even faster.

LZ4-HC is supposed to produce a compression ratio similar to ZLIB, and at
about the same speed as ZLIB. It decompresses as fast as the regular LZ4
(>2 GB/s). Unfortunately, the improved compression ratio doesn't show in
my tests, probably due to the fact we're compressing each row
individually. This may change against floating point data if someone wants
to test it.

ZSTD is a new algorithm by the author of LZ4 that is intended to replace
ZLIB. It compresses and decompresses extremely quickly, while maintaining
a similar compression ratio as ZLIB. Unfortunately, it's still in beta.

They are all under the BSD license. Links below show more info &
performance numbers.

https://github.com/Cyan4973/zstd

https://github.com/Cyan4973/lz4

http://www.lz4.org

I recommend incorporating the attached changes into GRASS, and leaving it
as optional for users. At some point in the future, after testing, GRASS
should move to using ZSTD as default. Power users who want the best
performance '''now''' (and have the disk space) can use LZ4 immediately.

It would be trivial to further alter get_row.c & put_row.c to use LZ4 for
floating point compression. (And I would recommend someone do that.)

Note: I've decided to use LZ4_decompress_fast() instead of
LZ4_decompress_safe(). In my test, it was noticeably faster. According to
the documentation, it leaves LZ4 open to a malicious attack. If this is a
serious concern in the GRASS GIS internal data structures, change the
commenting in get_row.c to use the safer code.

On my computer, I've better than halved (!) the runtime of a r.mapcalc
identity operation when using LZ4. Below are my tests while working with a
RapidEye scene.

{{{
> time r.mapcalc expression="out_test_zlib=out_test_lz4hc" --overwrite
  100%

real 3m33.503s
user 3m25.451s
sys 0m6.750s
> time r.mapcalc expression="out_test_zlib=out_test_lz4hc" --overwrite
  100%

real 3m34.398s
user 3m26.684s
sys 0m6.138s
> export GRASS_INT_LZ4=1
> time r.mapcalc expression="out_test_lz4=out_test_lz4hc" --overwrite
  100%

real 1m31.222s
user 1m25.379s
sys 0m5.035s
> time r.mapcalc expression="out_test_lz4=out_test_lz4hc" --overwrite
  100%

real 1m29.792s
user 1m24.029s
sys 0m4.858s
> unset GRASS_INT_LZ4
> export GRASS_INT_LZ4HC=1
> time r.mapcalc expression="out_test_lz4hc2=out_test_lz4hc" --overwrite
  100%

real 3m5.332s
user 2m58.610s
sys 0m5.603s
> time r.mapcalc expression="out_test_lz4hc2=out_test_lz4hc" --overwrite
  100%

real 3m3.710s
user 2m56.606s
sys 0m5.858s
> unset GRASS_INT_LZ4HC
> export GRASS_INT_ZSTD=1
> time r.mapcalc expression="out_test_zstd=out_test_lz4hc" --overwrite
  100%

real 1m38.322s
user 1m32.654s
sys 0m4.897s
> time r.mapcalc expression="out_test_zstd=out_test_lz4hc" --overwrite
  100%

real 1m42.370s
user 1m35.487s
sys 0m5.282s
> unset GRASS_INT_ZSTD
> ls -l vrt_test/PERMANENT/cell/out_test_*
-rw-r--r-- 1 sprice staff 4080217012 Sep 26 14:01
vrt_test/PERMANENT/cell/out_test_lz4
-rw-r--r-- 1 sprice staff 4069728048 Sep 26 13:34
vrt_test/PERMANENT/cell/out_test_lz4hc
-rw-r--r-- 1 sprice staff 4069728048 Sep 26 14:08
vrt_test/PERMANENT/cell/out_test_lz4hc2
-rw-r--r-- 1 sprice staff 3737100577 Sep 26 13:57
vrt_test/PERMANENT/cell/out_test_zlib
-rw-r--r-- 1 sprice staff 3811356101 Sep 26 14:12
vrt_test/PERMANENT/cell/out_test_zstd
> time r.univar out_test_zlib
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m29.980s
user 1m27.111s
sys 0m2.589s
> time r.univar out_test_lz4
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m9.883s
user 1m7.559s
sys 0m2.210s
> time r.univar out_test_lz4hc
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m10.199s
user 1m7.902s
sys 0m2.173s
> time r.univar out_test_zstd
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m25.206s
user 1m21.351s
sys 0m2.518s
}}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2750&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Changes (by sprice):

* Attachment "lz4_zstd.tgz" added.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by sprice):

I've gone ahead and added code to also compress FCELL/DCELL. Same tests at
before. Late 2013 Mac Pro, v10.10.5, 64 GB RAM, 3.5 GHz 6-Core. The
`r.mapcalc` commands are intended to test real-world write performance.
The `ls -l` command shows file sizes. `r.univar` is intended to show read
performance.

tl;dr: Write speed with LZ4 & ZSTD continues to be double the current
method. Read speed also significantly faster. LZ4 shows the best
performance if you have the disk space. ZSTD is very good all around, and
doesn't fill the hard drive as fast.

{{{
> time r.mapcalc expression="out_fp_orig=float(out_test_lz4hc)"
--overwrite
  100%

real 4m57.518s
user 4m48.823s
sys 0m7.118s
> time r.mapcalc expression="out_fp_orig=float(out_test_lz4hc)"
--overwrite
  100%

real 4m57.555s
user 4m49.946s
sys 0m6.382s
> export GRASS_INT_LZ4=1
> time r.mapcalc expression="out_fp_lz4=float(out_test_lz4hc)" --overwrite
  100%

real 1m51.136s
user 1m44.766s
sys 0m5.879s
> time r.mapcalc expression="out_fp_lz4=float(out_test_lz4hc)" --overwrite
  100%

real 1m51.898s
user 1m44.991s
sys 0m5.928s
> unset GRASS_INT_LZ4
> export GRASS_INT_LZ4HC=1
> time r.mapcalc expression="out_fp_lz4hc=float(out_test_lz4hc)"
--overwrite
  100%

real 4m22.887s
user 4m15.556s
sys 0m6.206s
> time r.mapcalc expression="out_fp_lz4hc=float(out_test_lz4hc)"
--overwrite
  100%

real 4m22.143s
user 4m15.164s
sys 0m6.075s
> unset GRASS_INT_LZ4HC
> export GRASS_INT_ZSTD=1
> time r.mapcalc expression="out_fp_zstd=float(out_test_lz4hc)"
--overwrite
  100%

real 1m49.065s
user 1m43.401s
sys 0m5.183s
> time r.mapcalc expression="out_fp_zstd=float(out_test_lz4hc)"
--overwrite
  100%

real 1m48.771s
user 1m43.238s
sys 0m5.093s
> ls -l vrt_test/PERMANENT/fcell/
total 46206984
-rw-r--r-- 1 sprice staff 7268345798 Sep 27 19:57 out_fp_lz4
-rw-r--r-- 1 sprice staff 5979129773 Sep 27 20:06 out_fp_lz4hc
-rw-r--r-- 1 sprice staff 4969771486 Sep 27 19:53 out_fp_orig
-rw-r--r-- 1 sprice staff 5440720320 Sep 27 20:10 out_fp_zstd
> unset GRASS_INT_ZSTD
> r.compress -p out_test_zlib
<out_test_zlib> is compressed (level 2: DEFLATE). Data type: <CELL>
> r.compress -p out_test_lz4
<out_test_lz4> is compressed (level 3: LZ4). Data type: <CELL>
> r.compress -p out_test_lz4hc
<out_test_lz4hc> is compressed (level 4: LZ4HC). Data type: <CELL>
> r.compress -p out_test_zstd
<out_test_zstd> is compressed (level 5: ZSTD). Data type: <CELL>
> r.compress -p out_fp_zstd
<out_fp_zstd> is compressed (level 5: ZSTD). Data type: <FCELL>
> r.compress -p out_fp_orig
<out_fp_orig> is compressed (level 2: DEFLATE). Data type: <FCELL>
> r.compress -p out_fp_lz4
<out_fp_lz4> is compressed (level 3: LZ4). Data type: <FCELL>
> r.compress -p out_fp_lz4hc
<out_fp_lz4hc> is compressed (level 4: LZ4HC). Data type: <FCELL>
> time r.univar out_fp_orig
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m49.227s
user 1m46.152s
sys 0m2.843s
> time r.univar out_fp_lz4
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m8.749s
user 1m4.596s
sys 0m3.564s
> time r.univar out_fp_lz4hc
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m10.307s
user 1m6.546s
sys 0m3.352s
> time r.univar out_fp_zstd
  100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real 1m31.310s
user 1m27.916s
sys 0m3.044s
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:1&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Changes (by sprice):

* Attachment "lz4_zstd2.tgz" added.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

I have
[https://grass.osgeo.org/grass71/manuals/libpython/gunittest_running_tests.html
#running-tests-report executed all tests] with `lz4_zstd2.tgz​` and I got
22 (17%) more failing tests, unfortunately. One example is `r.viewshed`
test where the results are just completely wrong:

{{{
...
mismatch values (key, reference, actual):
[('max', 43.15356, -3.44000005722046),
  ('min', -24.98421, -3.44000005722046),
  ('null_cells', 1963758, 0)]
...
}}}

I haven't investigated further. Can anybody confirm or disproof?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:2&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by neteler):

Replying to [comment:1 sprice]:
> I've gone ahead and added code to also compress FCELL/DCELL.

Great!

A question: is the NULL file also compressed (see the related ticket
#2349) with the new compression algorithm? This would be important - I
could not yet check your patches myself.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:3&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by sprice):

The null file is compressed when GRASS_COMPRESS_NULLS is set.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:4&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Changes (by sprice):

* Attachment "lz4_zstd3.tgz" added.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by sprice):

I just uploaded a new version of the code that shouldn't fail any
additional unit tests, and I've added unit tests to r.compress that test
reading/writing the different compression schemes.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2750#comment:5&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

Replying to [comment:5 sprice]:
> I just uploaded a new version of the code that shouldn't fail any
additional unit tests, and I've added unit tests to r.compress that test
reading/writing the different compression schemes.

Nice. Runs for me as well. Test is well written, but to be sure, please
improve setting of the environmental variables. When you do
`os.environ[env_var] = '1'`, `env_var` stays in the environment, so next
time it might be picked up (to be honest, right now I don't understand why
it is not picked up). You can run a module in an isolated environment by
something like this:

{{{
env = os.environ.copy()
env[env_var] = '1'
...Module(..., env_=env)
}}}

I haven't tried that but in theory it should work with all `*Module`
functions as well as classes (all is based on
[https://grass.osgeo.org/grass70/manuals/libpython/pygrass.modules.interface.html
#module-pygrass.modules.interface.module PyGRASS Module]).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:6&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

Replying to [comment:6 wenzeslaus]:
> Replying to [comment:5 sprice]:
> > I just uploaded a new version of the code that shouldn't fail any
additional unit tests, and I've added unit tests to r.compress that test
reading/writing the different compression schemes.
>
> Nice. Runs for me as well.

Bad news. Only the r.compress tests runs well for me. I messed up the
environmental variables.

Now (with `export GRASS_INT_LZ4=1`) I'm getting fails from various tests.

Wrong values in the result (for example):

{{{
./raster/r.series.interp/testsuite/interp_test.py

FAIL: test_infile (__main__.InterpolationTest)
----------------------------------------------------------------------
AssertionError: The actual maximum (269) is greater than the reference
one (200) for raster map prec_2 (with minimum 200)
}}}

Possible segmentation faults:

{{{
./raster/r.gwflow/testsuite/validation_7x7_grid.py

FAIL: test_transient (__main__.Validation7x7Grid)
---------------------------------------------------------------------
AssertionError: Running <r.gwflow> module ended with non-zero return code
(-11)
}}}

{{{
./temporal/t.rast.univar/testsuite/test.t.rast.univar.sh
...
grass.exceptions.CalledModuleError: Module run ['r.univar', '-ge',
u'map=prec_1@__temporal_t_rast_univar_test.t. rast.univar'] ended with
error
Process ended with non-zero return code -11.
}}}

With the changes applied but with default settings (`unset
GRASS_INT_LZ4`), I'm getting the same results as at the
[http://fatra.cnr.ncsu.edu/grassgistests/reports_for_date-2015-10-08-07-00/report_for_nc_basic_spm_grass7_nc/testfiles.html
test server]. r.compress test runs well in both cases.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:7&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Changes (by sprice):

* Attachment "lz4_zstd4.tgz" added.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by sprice):

I think I've fixed all issues. And a bad memory read that was in there
before that valgrind discovered. And a segfault in n_les_assemble.c (via
r.gwflow) that someone should check out. I've uploaded a new set of files.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:8&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

Replying to [comment:8 sprice]:
> I think I've fixed all issues.

It works for me as well. With `GRASS_INT_LZ4=1` I get

{{{
Executed 129 test files in 0:25:10.484136.
From them 120 files (93%) were successful and 9 files (7%) failed.
}}}

without `GRASS_INT_LZ4` it is the same including the time (I don't expect
that the improvement should be visible on speed of the tests, at least not
much). (7% is little bit more than
[http://fatra.cnr.ncsu.edu/grassgistests/reports_for_date-2015-10-09-07-00/report_for_nc_basic_spm_grass7_nc/testfiles.html
on the server] but the one test which is different is unrelated to this.)
Now running tests also with the other compressions.

> And a bad memory read that was in there before that valgrind discovered.
And a segfault in n_les_assemble.c (via r.gwflow) that someone should
check out.

I don't understand you completely. `r.gwflow` doesn't fail with the
current trunk and it doesn't fail even with your changes now for me. Does
it fail for you?

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:9&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Changes (by wenzeslaus):

* Attachment "raster_compress_benchmark.sh" added.

Benchmark DCELL

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

The tests are running well for me also with `GRASS_INT_LZ4HC=1`,
`GRASS_INT_ZSTD=1` and `GRASS_INT_ZLIB=0` (RLE). I haven't tried
`GRASS_COMPRESS_NULLS=1`.

Here is a report (polished) created by the attached benchmark script based
on what you posted which uses completely random data. I have used
`GRASS_COMPRESS_NULLS=1` and region with 30,000,000 cells. The disk was
SSD, OS was Linux.

{{{
#!rst
ZLIB compression writing

Performance counter stats for 'r.mapcalc
expression=test_rast_orig=double(test_rast_z_base)' (10 runs):

::

       10415.903368 task-clock (msec) # 0.993 CPUs utilized
( +- 0.36% )
               2798 context-switches # 0.269 K/sec
( +- 2.11% )
                 23 cpu-migrations # 0.002 K/sec
( +- 4.20% )
             325702 page-faults # 0.031 M/sec
( +- 0.00% )
        30804357778 cycles # 2.957 GHz
( +- 0.42% )
         9055572140 stalled-cycles-frontend # 29.40% frontend cycles
idle ( +- 1.81% )
        47982328290 instructions # 1.56 insns per cycle
                                              # 0.19 stalled cycles per
insn ( +- 0.02% )
         7087070642 branches # 680.409 M/sec
( +- 0.02% )
          341325584 branch-misses # 4.82% of all branches
( +- 0.05% )

       10.489354952 seconds time elapsed
( +- 0.41% )

RLE compression writing

Performance counter stats for 'r.mapcalc
expression=test_rast_rle=double(test_rast_z_base)' (10 runs):

::

       10367.674362 task-clock (msec) # 0.999 CPUs utilized
( +- 0.53% )
               1642 context-switches # 0.158 K/sec
( +- 18.72% )
                 22 cpu-migrations # 0.002 K/sec
( +- 5.32% )
             325702 page-faults # 0.031 M/sec
( +- 0.00% )
        30666690391 cycles # 2.958 GHz
( +- 0.38% )
         8921313281 stalled-cycles-frontend # 29.09% frontend cycles
idle ( +- 1.80% )
        47975696799 instructions # 1.56 insns per cycle
                                              # 0.19 stalled cycles per
insn ( +- 0.02% )
         7085878436 branches # 683.459 M/sec
( +- 0.02% )
          340649966 branch-misses # 4.81% of all branches
( +- 0.04% )

       10.382500561 seconds time elapsed
( +- 0.53% )

LZ4 compression writing

Performance counter stats for 'r.mapcalc
expression=test_rast_lz4=double(test_rast_z_base)' (10 runs):

::

        2490.815692 task-clock (msec) # 0.999 CPUs utilized
( +- 0.23% )
                321 context-switches # 0.129 K/sec
( +- 13.63% )
                 20 cpu-migrations # 0.008 K/sec
( +- 5.02% )
                684 page-faults # 0.274 K/sec
( +- 0.12% )
         7259170408 cycles # 2.914 GHz
( +- 0.12% )
         2305705372 stalled-cycles-frontend # 31.76% frontend cycles
idle ( +- 0.20% )
        13796117271 instructions # 1.90 insns per cycle
                                              # 0.17 stalled cycles per
insn ( +- 0.06% )
         2790495244 branches # 1120.314 M/sec
( +- 0.05% )
           33371582 branch-misses # 1.20% of all branches
( +- 0.41% )

        2.492994675 seconds time elapsed
( +- 0.23% )

LZ4HC compression writing

Performance counter stats for 'r.mapcalc
expression=test_rast_lz4hc=double(test_rast_z_base)' (10 runs):

::

        6867.635439 task-clock (msec) # 0.999 CPUs utilized
( +- 0.25% )
                648 context-switches # 0.094 K/sec
( +- 0.29% )
                 21 cpu-migrations # 0.003 K/sec
( +- 5.28% )
                745 page-faults # 0.108 K/sec
( +- 0.18% )
        20199681252 cycles # 2.941 GHz
( +- 0.28% )
         6449729534 stalled-cycles-frontend # 31.93% frontend cycles
idle ( +- 0.62% )
        31860120047 instructions # 1.58 insns per cycle
                                              # 0.20 stalled cycles per
insn ( +- 0.03% )
         5196919230 branches # 756.726 M/sec
( +- 0.03% )
          184132785 branch-misses # 3.54% of all branches
( +- 0.04% )

        6.873512386 seconds time elapsed
( +- 0.25% )

ZSTD compression writing

Performance counter stats for 'r.mapcalc
expression=test_rast_zstd=double(test_rast_z_base)' (10 runs):

::

        3540.287381 task-clock (msec) # 0.999 CPUs utilized
( +- 0.20% )
                382 context-switches # 0.108 K/sec
( +- 3.67% )
                 24 cpu-migrations # 0.007 K/sec
( +- 5.61% )
                776 page-faults # 0.219 K/sec
( +- 0.13% )
        10367186950 cycles # 2.928 GHz
( +- 0.05% )
         3160263203 stalled-cycles-frontend # 30.48% frontend cycles
idle ( +- 0.10% )
        19098247069 instructions # 1.84 insns per cycle
                                              # 0.17 stalled cycles per
insn ( +- 0.04% )
         3831842251 branches # 1082.353 M/sec
( +- 0.04% )
           35124859 branch-misses # 0.92% of all branches
( +- 0.16% )

        3.543262199 seconds time elapsed
( +- 0.20% )

Original raster map test

Performance counter stats for 'r.univar test_rast_z_base' (10 runs):

::

        2024.195978 task-clock (msec) # 0.998 CPUs utilized
( +- 0.29% )
                646 context-switches # 0.319 K/sec
( +- 0.64% )
                  0 cpu-migrations # 0.000 K/sec
                457 page-faults # 0.226 K/sec
( +- 0.04% )
         5934175598 cycles # 2.932 GHz
( +- 0.05% )
         1712911175 stalled-cycles-frontend # 28.87% frontend cycles
idle ( +- 0.14% )
        11404604123 instructions # 1.92 insns per cycle
                                              # 0.15 stalled cycles per
insn ( +- 0.00% )
         2280049632 branches # 1126.398 M/sec
( +- 0.00% )
           32906874 branch-misses # 1.44% of all branches
( +- 0.37% )

        2.029035083 seconds time elapsed
( +- 0.28% )

ZLIB compression reading

Performance counter stats for 'r.univar test_rast_orig' (10 runs):

::

        2000.246389 task-clock (msec) # 0.998 CPUs utilized
( +- 0.42% )
                640 context-switches # 0.320 K/sec
( +- 1.09% )
                  0 cpu-migrations # 0.000 K/sec
                458 page-faults # 0.229 K/sec
( +- 0.02% )
         5930779846 cycles # 2.965 GHz
( +- 0.08% )
         1716273412 stalled-cycles-frontend # 28.94% frontend cycles
idle ( +- 0.18% )
        11406021691 instructions # 1.92 insns per cycle
                                              # 0.15 stalled cycles per
insn ( +- 0.01% )
         2280208665 branches # 1139.964 M/sec
( +- 0.01% )
           32553520 branch-misses # 1.43% of all branches
( +- 0.24% )

        2.005018871 seconds time elapsed
( +- 0.42% )

RLE compression reading

Performance counter stats for 'r.univar test_rast_rle' (10 runs):

::

        2016.279711 task-clock (msec) # 0.998 CPUs utilized
( +- 0.34% )
                653 context-switches # 0.324 K/sec
( +- 1.34% )
                  0 cpu-migrations # 0.000 K/sec
( +- 50.92% )
                458 page-faults # 0.227 K/sec
( +- 0.04% )
         5931202618 cycles # 2.942 GHz
( +- 0.07% )
         1711592367 stalled-cycles-frontend # 28.86% frontend cycles
idle ( +- 0.13% )
        11406103365 instructions # 1.92 insns per cycle
                                              # 0.15 stalled cycles per
insn ( +- 0.01% )
         2280223560 branches # 1130.906 M/sec
( +- 0.01% )
           32763877 branch-misses # 1.44% of all branches
( +- 0.41% )

        2.021075900 seconds time elapsed
( +- 0.34% )

LZ4 compression reading

Performance counter stats for 'r.univar test_rast_lz4' (10 runs):

::

         690.267191 task-clock (msec) # 0.998 CPUs utilized
( +- 0.37% )
                235 context-switches # 0.341 K/sec
( +- 1.55% )
                  0 cpu-migrations # 0.000 K/sec
                449 page-faults # 0.650 K/sec
( +- 0.04% )
         2003905382 cycles # 2.903 GHz
( +- 0.11% )
          586090598 stalled-cycles-frontend # 29.25% frontend cycles
idle ( +- 0.29% )
         3982189156 instructions # 1.99 insns per cycle
                                              # 0.15 stalled cycles per
insn ( +- 0.04% )
          971667430 branches # 1407.669 M/sec
( +- 0.02% )
              64052 branch-misses # 0.01% of all branches
( +- 2.47% )

        0.691904075 seconds time elapsed
( +- 0.37% )

LZ4HC compression reading

Performance counter stats for 'r.univar test_rast_lz4hc' (10 runs):

::

         692.453563 task-clock (msec) # 0.998 CPUs utilized
( +- 0.18% )
                243 context-switches # 0.351 K/sec
( +- 0.96% )
                  0 cpu-migrations # 0.000 K/sec
                449 page-faults # 0.649 K/sec
( +- 0.03% )
         1999520778 cycles # 2.888 GHz
( +- 0.09% )
          581415687 stalled-cycles-frontend # 29.08% frontend cycles
idle ( +- 0.23% )
         3982233099 instructions # 1.99 insns per cycle
                                              # 0.15 stalled cycles per
insn ( +- 0.04% )
          971675561 branches # 1403.236 M/sec
( +- 0.02% )
              63306 branch-misses # 0.01% of all branches
( +- 1.98% )

        0.694124867 seconds time elapsed
( +- 0.18% )

ZSTD compression reading

Performance counter stats for 'r.univar test_rast_zstd' (10 runs):

::

        1168.682507 task-clock (msec) # 0.998 CPUs utilized
( +- 0.41% )
                377 context-switches # 0.323 K/sec
( +- 1.35% )
                  0 cpu-migrations # 0.000 K/sec
                460 page-faults # 0.394 K/sec
( +- 0.06% )
         3397563517 cycles # 2.907 GHz
( +- 0.06% )
          780090112 stalled-cycles-frontend # 22.96% frontend cycles
idle ( +- 0.28% )
         8084726269 instructions # 2.38 insns per cycle
                                              # 0.10 stalled cycles per
insn ( +- 0.02% )
         1325103816 branches # 1133.844 M/sec
( +- 0.02% )
             426732 branch-misses # 0.03% of all branches
( +- 1.51% )

        1.171226998 seconds time elapsed
( +- 0.41% )

Check of types and compression

::

     <test_rast_z_base> is compressed (level 2: DEFLATE). Data type:
<DCELL>
     <test_rast_orig> is compressed (level 2: DEFLATE). Data type: <DCELL>
     <test_rast_rle> is compressed (level 1: RLE). Data type: <DCELL>
     <test_rast_lz4> is compressed (level 3: LZ4). Data type: <DCELL>
     <test_rast_lz4hc> is compressed (level 4: LZ4HC). Data type: <DCELL>
     <test_rast_zstd> is compressed (level 5: ZSTD). Data type: <DCELL>

File sizes

::

  240045009 Oct 9 23:26 fcell/test_rast_lz4
  240045009 Oct 9 23:27 fcell/test_rast_lz4hc
  229517654 Oct 9 23:24 fcell/test_rast_orig
  229517654 Oct 9 23:26 fcell/test_rast_rle
  229517654 Oct 9 23:22 fcell/test_rast_z_base
  227636390 Oct 9 23:28 fcell/test_rast_zstd
  105009 Oct 9 23:26 cell_misc/test_rast_lz4/null2
  105009 Oct 9 23:27 cell_misc/test_rast_lz4hc/null2
  125009 Oct 9 23:24 cell_misc/test_rast_orig/null2
  125009 Oct 9 23:26 cell_misc/test_rast_rle/null2
  125009 Oct 9 23:22 cell_misc/test_rast_z_base/null2
  175009 Oct 9 23:28 cell_misc/test_rast_zstd/null2
}}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:10&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by sprice):

Completely random data is not a good test because (in theory) it won't
compress. However, it looks like it demonstrated the expected results.

If you do a diff with n_les_assemble.c you'll see that I added a few if
statements to ensure that the indexing stays within row/col bounds. I was
getting segfaults with corrupt data before I fixed the other bugs. I
figure it should be able to handle any data without segfaulting, even if
corrupt.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:11&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

Replying to [comment:11 sprice]:
> Completely random data is not a good test because (in theory) it won't
compress. However, it looks like it demonstrated the expected results.

Yes, probably G7:r.surf.fractal or G7:r.random.surface are better, but
G7:r.mapcalc with `rand()` is kind of worst-case scenario and the
compression still worked well!

> If you do a diff with n_les_assemble.c...

I've created a separate ticket #2754 for this. Let's discuss it there.

I've executed the tests with `GRASS_COMPRESS_NULLS=1` in combination with
default compression and `GRASS_INT_LZ4=1` and they seems to be OK
(although there is some issue with ''r.slope.aspect'' test but that's
unrelated). But still, the test coverage is limited, user testing is
needed.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:12&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

In the diff I see the following pattern (this if from
[source:grass/trunk/lib/raster/get_row.c#L84 lib/raster/get_row.c],
`read_data_fp_compressed()` function):

{{{
#!diff
- if ((size_t) G_zlib_read(fcb->data_fd, readamount, data_buf, bufsize)
!= bufsize)
+ cmp = G_alloca(readamount);
+
+ if (read(fcb->data_fd, cmp, readamount) != readamount) {
+ G_freea(cmp);
         G_fatal_error(_("Error reading raster data for row %d of <%s>"),
                       row, fcb->name);
  }

+ if (cmp[0] == '0') // flate.c::G_ZLIB_COMPRESSED_NO
+ // Uncompressed
+ memcpy(data_buf, cmp+1, bufsize);
+ else
+ if (cmp[0] == 3 || cmp[0] == 4)
+// LZ4_decompress_safe((char *)(cmp+1), (char *)data_buf,
+// readamount-1, bufsize);
+ LZ4_decompress_fast((char *)(cmp+1), (char *)data_buf, bufsize);
+ else if (cmp[0] == 5)
+ ZSTD_decompress(data_buf, bufsize, cmp+1, readamount-1);
+ else
+ G_zlib_expand(cmp+1, readamount-1, data_buf, bufsize);
+
+ G_freea(cmp);
+}
}}}

In words, `G_zlib_read()` function is replaced by a block of code. Similar
applies to `G_zlib_write()` function and to certain extent to
`G_zlib_expand()` and `zlib_compress()` as well.

My question is if it wouldn't be more advantageous to create some wrapper
which would take the all necessary inputs including compression type and
do the necessary switches and format specific things.

I suppose this would make easier to add the same compression options also
to 3D raster library where `G_zlib_read()` and `G_zlib_write()` functions
are already used ([source:grass/trunk/lib/raster3d/fpcompress.c#L718
lib/raster3d/fpcompress.c]) or perhaps even to some other code such as
[source:grass/trunk/lib/segment lib/segment] where no compression is used.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:13&gt;
GRASS GIS <https://grass.osgeo.org>

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter: sprice | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
       CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------

Comment (by glynn):

Replying to [comment:13 wenzeslaus]:

> My question is if it wouldn't be more advantageous to create some
wrapper which would take the all necessary inputs including compression
type and do the necessary switches and format specific things.

Agreed.

In practical terms, there are only two distinct cases: uncompressed (where
the size of the data read or written matches the size of the data stored
in the file) and compressed (where the sizes differ). Everything else is
just options.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:14&gt;
GRASS GIS <https://grass.osgeo.org>