[GRASS-user] Slow import of GHSL

(Sorry for silence, was without my personal computer for a week.)

* Markus Metz <markus.metz.giswork@gmail.com> [2017-03-22 22:11:01 +0100]:

On Wed, Mar 22, 2017 at 9:52 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Mar 22, 2017 at 9:28 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <neteler@osgeo.org>

wrote:

...
>> Nikos, for an even bigger map try
>>
>> Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
>> to 60° south):
>> http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
>> by USGS. 1.6GB in size.

Interesting this is. See also:
https://global-surface-water.appspot.com/, at 30m, Landsat-based as
well.

>> Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.
>>
>> After import into GRASS GIS, here the timings:
>>
>> # final map size:
>> g.region -p
>> ...
>> rows: 493200
>> cols: 1296001
>> cells: 639187693200
>>
>> (handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
>> improvements on global data import are needed).
>
> (my changes were bug fixes, not improvements)
>
>>
>> Benchmarks:
>> - Import took 2h while reading the data from a CIFS mounted storage
>> box (slow) and writing on SSD.

Markus N, I am interested: did you use the "memory" option?

>> - Displaying the entire map (639 giga-pixel) in GRASS GIS' display
>> (d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
>> since I am at a conference.
>>
>> Fair deal I would say :slight_smile:
>
> A bit more information would help to compare:
> - what is your GDAL version?

GDAL 2.1.2

> - are 504 GeoTIFF files compressed? If yes, which method?

Yes, COMPRESSION=LZW

> - what are the block dimensions of the input GeoTIFFs?

Size is 36001, 36001 - Block=36001x1

Now that's important too. What about GHSL's block size of 4K^2?
My understanding is that it would make a difference, for GRASS, if I
would redo the GHSL layers with a row-shaped "block". Makes sense?

This is row by row compression as in GRASS. That could help import with
r.in.gdal which also reads and writes row by row.

Type=Byte

> - what kind of GRASS compression did you use?

Default raster + NULL compression enabled. I.e.,

r.compress -p watermask2010
<watermask2010> is compressed (method 2: ZLIB). Data type: CELL

You might save disk space at the cost of longer reading times with BZIP2.

<watermask2010> has a compressed NULL file

Again, the fact that I had to read from an attached storage box likely
slowed down the import.
Just thought to post these numbers here.

Impressive that such a large raster can be imported at all, and relatively
fasto!

Indeed, impressive.

Nikos

Reading about 1.6 GB (also from an attached storage box) should not take 2
hours, therefore I think the limit is software input decompression and
output compression.

Markus M

On Fri, Mar 24, 2017 at 10:25 AM, Nikos Alexandris
<nik@nikosalexandris.net> wrote:

* Markus Metz <markus.metz.giswork@gmail.com> [2017-03-22 22:11:01 +0100]:

On Wed, Mar 22, 2017 at 9:52 PM, Markus Neteler <neteler@osgeo.org> wrote:

...

Markus N, I am interested: did you use the "memory" option?

I left r.in.gdal's default value.

...

My understanding is that it would make a difference, for GRASS, if I
would redo the GHSL layers with a row-shaped "block". Makes sense?

Why spend time on redoing the GHSL layers? Do you have to import them
frequently?

markusN