[GRASS-user] Slow import of GHSL

NikosAlexandris · March 10, 2017, 4:47pm

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
db progress slow?

Similar GHSL data sets vary between 300 ~ 500 MB in size.

As well, trying to clip the GeoTIFFs (not the VRT files) with gdal
tools to a custom extent (say Europe), appears to be a heavy process.

Thanks for hints, Nikos

[0] http://ghsl.jrc.ec.europa.eu/

neteler · March 10, 2017, 10:16pm

On Fri, Mar 10, 2017 at 5:47 PM, Nikos Alexandris
<nik@nikosalexandris.net> wrote:

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
db progress slow?

Can you elaborate a bit more? I have downloaded and checked:

That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

Similar GHSL data sets vary between 300 ~ 500 MB in size.

Yes - do you have a SSD disk? This quite helps along with a
sufficiently large GDAL cache ("memory" parameter of r.in.gdal).

As well, trying to clip the GeoTIFFs (not the VRT files) with gdal
tools to a custom extent (say Europe), appears to be a heavy process.

With GDAL, be sure to have set something like
export GDAL_CACHEMAX=2000

HTH,
Markus

Thanks for hints, Nikos

[0] http://ghsl.jrc.ec.europa.eu/

NikosAlexandris · March 11, 2017, 7:53am

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
db progress slow?

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

Can you elaborate a bit more? I have downloaded and checked:

That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

I suggested them, already, to have single "pool" directory just with the
data, zipped and the license in it, for each data set.

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

Similar GHSL data sets vary between 300 ~ 500 MB in size.

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB) GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB) GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB) GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

"3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif

Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression. Even then,
the derived files, which are a subset in terms of extent, are enormous
compared to their source, say p1 or p2.

Creating a new VRT, works of course instantaneously. For example:

# some custom Europe's extent
ogrinfo -al europe_extent_epsg_3857/corine_2000.shp |grep Ext

Extent: (-6290123.623699, 2788074.747995) - (8115874.019718, 8170181.584331)

# extract the above subset in a new VRT
gdal_translate -projwin -6290123.623699 8170181.584331 8115874.019718 2788074.747995 GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt test.vrt -of VRT

# build some overview for it (or for the p1 or p2 GeoTIFFs) -- slow for all options
gdaladdo -ro --config COMPRESS_OVERVIEW LZW test.vrt 2 4 8 16

If it's not for a VRT file, the subset extraction is very slow.
The files appear to be practically hard to process, one needs to wait
several hours for a clip.

The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.

Yes - do you have a SSD disk? This quite helps along with a
sufficiently large GDAL cache ("memory" parameter of r.in.gdal).

Among tests, I had set that to 2047. No obvious improvement.

As well, trying to clip the GeoTIFFs (not the VRT files) with gdal
tools to a custom extent (say Europe), appears to be a heavy process.

With GDAL, be sure to have set something like
export GDAL_CACHEMAX=2000

(
Side question: why is max 2047? What if there is a lot more of RAM?
)

HTH,
Markus

Thank you Markus. I think there is more into it than the cache.

Nikos

[0] http://ghsl.jrc.ec.europa.eu/

Markus_Metz · March 11, 2017, 6:01pm

On Sat, Mar 11, 2017 at 8:53 AM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS’
db progress slow?

because it is a very large raster map: Size is 507904, 647168

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

Can you elaborate a bit more? I have downloaded and checked:

That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB) GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB) GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB) GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

“3857” is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif

Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression.

then use compression. The source tiffs use LZW with blocks of 4096x4096 cells.

The import of p1 or p2 or of the VRT file in GRASS’ data base, via
r.in.gdal/r.import, does not progress at all.

Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.

Recompressing with BZIP2 took 2:20 hours and the size of the cell file was reduced to a mere 143 MB.

(
Side question: why is max 2047? What if there is a lot more of RAM?
)

To avoid integer overflow because 2047 is converted to bytes with 2047 * 1024 * 1024.

Markus M

NikosAlexandris · March 14, 2017, 9:01am

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
db progress slow?

Markus M

because it is a very large raster map: Size is 507904, 647168

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

Can you elaborate a bit more? I have downloaded and checked:

That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)

GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

"3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif

Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression.

then use compression. The source tiffs use LZW with blocks of 4096x4096
cells.

The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.

Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.

Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
reduced to a mere 143 MB.

Some messy rough timings:

1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
for "p2.tif", each stuck at 3% for almost 14h

2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon

3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
1), stuck at 0% of progress for more than 16h.

SSD can be seen as a "necessity".

Nikos

[rest deleted]

Markus_Metz · March 14, 2017, 2:02pm

On Tue, Mar 14, 2017 at 10:01 AM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS’
db progress slow?

Markus M

because it is a very large raster map: Size is 507904, 647168

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

Can you elaborate a bit more? I have downloaded and checked:

That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)

GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

“3857” is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif

Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression.

then use compression. The source tiffs use LZW with blocks of 4096x4096
cells.

The import of p1 or p2 or of the VRT file in GRASS’ data base, via
r.in.gdal/r.import, does not progress at all.

Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.

Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
reduced to a mere 143 MB.

Some messy rough timings:

i7, 8 cores, 32GB RAM, Base OS: CentOS → Three r.in.gdal processes
for “p2.tif”, each stuck at 3% for almost 14h

Xeon, 24 Cores, 32GB RAM, Base OS: Windows → Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon

Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg → Same processes as in
1), stuck at 0% of progress for more than 16h.

SSD can be seen as a “necessity”.

Hmm, not really. With the p1 tif and GRASS db on the same spinning HDD, and 6 other heavy processes constantly reading from and writing to that same HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5 GB as output is not that heavy on disk IO. Most of the time is spent decompressing input and compressing output.

Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU? Anything slowing down the HDD(s)?

Markus M

NikosAlexandris · March 14, 2017, 3:17pm

* Markus Metz <markus.metz.giswork@gmail.com> [2017-03-14 15:02:30 +0100]:

On Tue, Mar 14, 2017 at 10:01 AM, Nikos Alexandris <nik@nikosalexandris.net>
wrote:

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0]

GeoTIFF

layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in

GRASS'

db progress slow?

Markus M

because it is a very large raster map: Size is 507904, 647168

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

Can you elaborate a bit more? I have downloaded and checked:

That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)

GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

"3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif

Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression.

then use compression. The source tiffs use LZW with blocks of 4096x4096
cells.

The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.

Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with

r.in.gdal

took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.

Recompressing with BZIP2 took 2:20 hours and the size of the cell file

was

reduced to a mere 143 MB.

Some messy rough timings:

1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
for "p2.tif", each stuck at 3% for almost 14h

2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon

3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
1), stuck at 0% of progress for more than 16h.

SSD can be seen as a "necessity".

Hmm, not really. With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that same
HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
Anything slowing down the HDD(s)?

Markus M

Ehm, maybe GDAL version 1.11.4? Just realised!
Working in restricted environment, time spent to configure things.
Will update...

Nikos

NikosAlexandris · March 14, 2017, 3:43pm

[all deleted]

Here's a non-elegant way, derived out of tests. Maybe a starter for the
Wiki. Elegant would be scripted, no need to manually enter any GRASS
session.

# Get Eurostat's NUTS_2013_01M_SH.zip vector map
unzip NUTS_2013_01M_SH.zip && cd NUTS_2013_01M_SH/data/
grass73 -c NUTS_2013_01M_SH.shp /geo/grassdb/europe/etrs89
v.import in=NUTS_RG_01M_2013.shp out=NUTS_RG_01M_2013

# draw, view, pick & set computational region of interest, create a vector map
v.in.region out=europe_less_box

# Clip original VRT to Europe’s extent, output as VRT
# and successively add overviews !Adding overviews takes time!

# 1990
gdal_translate -projwin -3480828.507849 11465936.382472 4989400.357796 3203413.703282 GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt -of VRT
gdaladdo -ro --config COMPRESS_OVERVIEW DEFLATE GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt 2 4 8 16

# 2000
gdal_translate -projwin -3480828.507849 11465936.382472 4989400.357796 3203413.703282 GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38_v1_0.vrt GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt -of VRT
gdaladdo -ro --config COMPRESS_OVERVIEW DEFLATE GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38_v1_0_Europe.tif 2 4 8 16

# 2014
gdal_translate -projwin -3480828.507849 11465936.382472 4989400.357796 3203413.703282 GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38_v1_0.vrt GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt -of VRT
gdaladdo -ro --config COMPRESS_OVERVIEW DEFLATE GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt  2 4 8 16

# Create 'epsg:3857' location for the 1990 data, set region &
  # work this in three different terminals
# resolution (v.proj-ing existing "box" map), import raster
grass72 -c "epsg:3857" /geo/grassdb/global/wgs84_3857_1990
v.proj dbase=/geo/grassdb/europe/ location=etrs89 mapset=PERMANENT in=europe_less_box out=europe_less_box
g.region -p vect=europe_less_box_epsg_3857 ewres=38.218470987084757 nsres=38.218446797782505
r.import input=GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt out=GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_Europe_Less memory=2047 extent=region

# Repeat for 2000
grass72 -c "epsg:3857" /geo/grassdb/global/wgs84_3857_2000
v.proj dbase=/geo/grassdb/europe/ location=etrs89 mapset=PERMANENT in=europe_less_box out=europe_less_box
g.region -p vect=europe_less_box_epsg_3857 ewres=38.218470987084757 nsres=38.218446797782505
r.import input=GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt out=GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38_v1_0_Europe_Less memory=2047 extent=region

# Repeat for 2014
grass72 -c "epsg:3857" /geo/grassdb/global/wgs84_3857_2014
v.proj dbase=/geo/grassdb/europe/ location=etrs89 mapset=PERMANENT in=europe_less_box out=europe_less_box
g.region -p vect=europe_less_box_epsg_3857 ewres=38.218470987084757 nsres=38.218446797782505
r.import input=GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38_v1_0_Europe_Less.vrt out=GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38_v1_0_Europe_Less memory=2047 extent=region

Nikos

NikosAlexandris · March 15, 2017, 5:03pm

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
db progress slow?

Markus M:

because it is a very large raster map: Size is 507904, 647168

Markus Neteler:

Can you elaborate a bit more? I have downloaded and checked:
That is 9835059101 bytes in 19885 files or I downloaded the wrong one
(please post an URL).

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
see
GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

"3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.

For example:
GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif

Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression.

then use compression. The source tiffs use LZW with blocks of 4096x4096
cells.

The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.

Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.

Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
reduced to a mere 143 MB.

Nikos:

Some messy rough timings:

1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
for "p2.tif", each stuck at 3% for almost 14h

2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon

3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
1), stuck at 0% of progress for more than 16h.

SSD can be seen as a "necessity".

Markus Metz:

Hmm, not really.

In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
progressing, in a quite acceptable manner. I had to break the process,
unfortunately, because I don't have a lot of free space :-/

With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that same
HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

p2 is a harder one!

Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
Anything slowing down the HDD(s)?

Yes, all processes, in my attempts 2 or 3 in parallel, where constantly
at 100%. RAM was not an issue.

No other heavy process in parallel. If it matters, working on i3wm and
firefox to browse (webmail, wikis, etc).

Nikos

Markus_Metz · March 15, 2017, 8:02pm

On Wed, Mar 15, 2017 at 6:03 PM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

[…]

Nikos:

Some messy rough timings:

i7, 8 cores, 32GB RAM, Base OS: CentOS → Three r.in.gdal processes
for “p2.tif”, each stuck at 3% for almost 14h

Xeon, 24 Cores, 32GB RAM, Base OS: Windows → Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon

Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg → Same processes as in
1), stuck at 0% of progress for more than 16h.

SSD can be seen as a “necessity”.

Markus Metz:

Hmm, not really.

In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
progressing, in a quite acceptable manner.

What is the gdal version you used? I use gdal 2.1.3.

I had to break the process,
unfortunately, because I don’t have a lot of free space :-/

maybe because you forgot the enable compression

With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that same
HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

p2 is a harder one!

export GDAL_CACHEMAX=10000
gdal_translate -co “COMPRESS=LZW” GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif p2_test.tif

finishes in 28 minutes.

you could try gdal 2.1.3, maybe 2.1.3 has a more efficient cache regarding block-wise reading than gdal 1.11.4

Best,

Markus M

NikosAlexandris · March 16, 2017, 10:26am

[..]

Nikos:

Some messy rough timings:
1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
for "p2.tif", each stuck at 3% for almost 14h
2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon
3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
1), stuck at 0% of progress for more than 16h.
SSD can be seen as a "necessity".

Markus M:

Hmm, not really.

Nikos:

In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
progressing, in a quite acceptable manner.

Markus M:

What is the gdal version you used? I use gdal 2.1.3.

Well, yes! 2.1.3 in the laptop, 1.11.4 for the rest.

I had to break the process,
unfortunately, because I don't have a lot of free space :-/

maybe because you forgot the enable compression

I should!

With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that same
HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

Is it an 10000rpm disk?

p2 is a harder one!

export GDAL_CACHEMAX=10000
gdal_translate -co "COMPRESS=LZW"
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif p2_test.tif

I did not emphasize it enough, but cache size was among my questions
initially. I wrongly assumed that it can't be more than 2047 due to the
reference in <https://grass.osgeo.org/grass72/manuals/r.in.gdal.html>:

--%<---
memory=integer
  ..
  Options: 0-2047
  ..
--->%--

I admit I did not head over to
https://trac.osgeo.org/gdal/wiki/ConfigOptions from where it is implied
that it can be much higher than 2047MB.

Can't r.in.gdal deal with memory=4096 for example (will try)? If yes,
can we update the manual(s)?

Also related? GTIFF_DIRECT_IO, GTIFF_VIRTUAL_MEM_IO

finishes in 28 minutes.

Impressive!

you could try gdal 2.1.3, maybe 2.1.3 has a more efficient cache regarding
block-wise reading than gdal 1.11.4

Yes, I have to.

Kudos, Nikos

Markus_Metz · March 16, 2017, 9:06pm

On Thu, Mar 16, 2017 at 11:26 AM, Nikos Alexandris <nik@nikosalexandris.net> wrote:

[…]

With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that same
HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

Is it an 10000rpm disk?

I think you are on the wrong track, disk IO does not matter here. It was a 7200rpm disk, and the output of r.in.gdal was about 1.5 GB. It takes only seconds, not hours to write 1.5 GB to a HDD.

p2 is a harder one!

export GDAL_CACHEMAX=10000
gdal_translate -co “COMPRESS=LZW”
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif p2_test.tif

Also related? GTIFF_DIRECT_IO, GTIFF_VIRTUAL_MEM_IO

Again, I think you are on the wrong track, disk IO does not matter here. And according to the GDAL documentation, GTIFF_DIRECT_IO, GTIFF_VIRTUAL_MEM_IO apply only to reading un-compressed TIFF files.

finishes in 28 minutes.

Impressive!

Hardware does not really matter here. To be precise, the difference between GDAL 1.11.4 and 2.1.3 is impressive, thanks to the efforts of the GDAL development team.

Regarding GDAL 2.1.3, profiling might tell why gdal_translate is so much faster than GRASS r.in.gdal.

Markus M

neteler · March 16, 2017, 11:39pm

On Mar 16, 2017 11:26 AM, “Nikos Alexandris” <nik@nikosalexandris.net> wrote:
…

unfortunately, because I don’t have a lot of free space :-/

maybe because you forgot the enable compression

I should!

Remember that you have to explicitly switch on the NULL compression:

https://grass.osgeo.org/grass72/manuals/rasterintro.html#raster-compression

Best
markusN

NikosAlexandris · March 17, 2017, 10:02am

[gdal-dev removed from Cc]

Nikos A:

unfortunately, because I don't have a lot of free space :-/

Markus M:

maybe because you forgot the enable compression

I should!

Markus N:

Remember that you have to explicitly switch on the NULL compression:
https://grass.osgeo.org/grass72/manuals/rasterintro.html#raster-compression

Thanks for this one too. I guess we can't opt for a "sane" default NULL
compression. Can we? For future versions?

There are "out-of-date" versions of GRASS, in software repositories. For example,
https://dl.fedoraproject.org/pub/epel/7/x86_64/g/, see for gdal and
grass. So, this complicates thigns (compatibility).

Nikos

NikosAlexandris · March 17, 2017, 10:11am

* Markus Metz <markus.metz.giswork@gmail.com> [2017-03-16 22:06:12 +0100]:

On Thu, Mar 16, 2017 at 11:26 AM, Nikos Alexandris <nik@nikosalexandris.net>
wrote:

[...]

With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that

same

HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and

1.5

GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

Is it an 10000rpm disk?

I think you are on the wrong track, disk IO does not matter here. It was a
7200rpm disk, and the output of r.in.gdal was about 1.5 GB. It takes only
seconds, not hours to write 1.5 GB to a HDD.

p2 is a harder one!

export GDAL_CACHEMAX=10000
gdal_translate -co "COMPRESS=LZW"
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif p2_test.tif

Also related? GTIFF_DIRECT_IO, GTIFF_VIRTUAL_MEM_IO

Again, I think you are on the wrong track, disk IO does not matter here.
And according to the GDAL documentation, GTIFF_DIRECT_IO,
GTIFF_VIRTUAL_MEM_IO apply only to reading un-compressed TIFF files.

finishes in 28 minutes.

Impressive!

Hardware does not really matter here. To be precise, the difference between
GDAL 1.11.4 and 2.1.3 is impressive, thanks to the efforts of the GDAL
development team.

Regarding GDAL 2.1.3, profiling might tell why gdal_translate is so much
faster than GRASS r.in.gdal.

Thanks Markus. Yes, on the wrong track. Useful lessons learned.

Nikos

ps- Working in a restricted environment (as in: I cannot install
whatsoever I need) is not easy.- Sure, I can possibly use a VM or
similar...

neteler · March 17, 2017, 7:34pm

On Fri, Mar 17, 2017 at 11:02 AM, Nikos Alexandris
<nik@nikosalexandris.net> wrote:

Markus N wrote:

Remember that you have to explicitly switch on the NULL compression:

https://grass.osgeo.org/grass72/manuals/rasterintro.html#raster-compression

Thanks for this one too. I guess we can't opt for a "sane" default NULL
compression. Can we? For future versions?

That I proposed here:

https://trac.osgeo.org/grass/ticket/2750#comment:61

and MarkusM suggested some tests to be implemented for that (any
volunteers? might be easy with some Python knowledge).

...

There are "out-of-date" versions of GRASS, in software repositories. For
example,
https://dl.fedoraproject.org/pub/epel/7/x86_64/g/, see for gdal and
grass. So, this complicates thigns (compatibility).

Try here:
https://copr.fedorainfracloud.org/coprs/neteler/GDAL/
and
https://copr.fedorainfracloud.org/coprs/neteler/grass72/

The latter is still pending a fix to get g.extension working but due
to travelling I cannot work on this at time.

Markus

--
Markus Neteler
http://www.mundialis.de - free data with free software
http://grass.osgeo.org
http://courses.neteler.org/blog

neteler · March 22, 2017, 7:12pm

On Sat, Mar 11, 2017 at 7:01 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Sat, Mar 11, 2017 at 8:53 AM, Nikos Alexandris <nik@nikosalexandris.net>
wrote:

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
db progress slow?

because it is a very large raster map: Size is 507904, 647168

Nikos, for an even bigger map try

Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
to 60° south): http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
by USGS. 1.6GB in size.

Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.

After import into GRASS GIS, here the timings:

# final map size:
g.region -p
...
rows: 493200
cols: 1296001
cells: 639187693200

(handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
improvements on global data import are needed).

Benchmarks:
- Import took 2h while reading the data from a CIFS mounted storage
box (slow) and writing on SSD.
- Displaying the entire map (639 giga-pixel) in GRASS GIS' display
(d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
since I am at a conference.

Fair deal I would say

cheers,
Markus

--
https://www.mundialis.de/

Markus_Metz · March 22, 2017, 8:28pm

On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Sat, Mar 11, 2017 at 7:01 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Sat, Mar 11, 2017 at 8:53 AM, Nikos Alexandris <nik@nikosalexandris.net>
wrote:

Nikos Alexandris

Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS’
db progress slow?

because it is a very large raster map: Size is 507904, 647168

Nikos, for an even bigger map try

Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
to 60° south): http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
by USGS. 1.6GB in size.

Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.

After import into GRASS GIS, here the timings:

final map size:

g.region -p
…
rows: 493200
cols: 1296001
cells: 639187693200

(handling only works in GRASS GIS 7.3.svn since Markus Metz’s recent
improvements on global data import are needed).

(my changes were bug fixes, not improvements)

Benchmarks:

Import took 2h while reading the data from a CIFS mounted storage
box (slow) and writing on SSD.

Displaying the entire map (639 giga-pixel) in GRASS GIS’ display
(d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
since I am at a conference.

Fair deal I would say

A bit more information would help to compare:

what is your GDAL version?
are 504 GeoTIFF files compressed? If yes, which method?
what are the block dimensions of the input GeoTIFFs?
what kind of GRASS compression did you use?

Markus M

neteler · March 22, 2017, 8:52pm

On Wed, Mar 22, 2017 at 9:28 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <neteler@osgeo.org> wrote:

...

Nikos, for an even bigger map try

Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
to 60° south):
http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
by USGS. 1.6GB in size.

Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.

After import into GRASS GIS, here the timings:

# final map size:
g.region -p
...
rows: 493200
cols: 1296001
cells: 639187693200

(handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
improvements on global data import are needed).

(my changes were bug fixes, not improvements)

Benchmarks:
- Import took 2h while reading the data from a CIFS mounted storage
box (slow) and writing on SSD.
- Displaying the entire map (639 giga-pixel) in GRASS GIS' display
(d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
since I am at a conference.

Fair deal I would say

A bit more information would help to compare:
- what is your GDAL version?

GDAL 2.1.2

- are 504 GeoTIFF files compressed? If yes, which method?

Yes, COMPRESSION=LZW

- what are the block dimensions of the input GeoTIFFs?

Size is 36001, 36001 - Block=36001x1
Type=Byte

- what kind of GRASS compression did you use?

Default raster + NULL compression enabled. I.e.,

r.compress -p watermask2010
<watermask2010> is compressed (method 2: ZLIB). Data type: CELL
<watermask2010> has a compressed NULL file

Again, the fact that I had to read from an attached storage box likely
slowed down the import.
Just thought to post these numbers here.

markusN

Markus_Metz · March 22, 2017, 9:11pm

On Wed, Mar 22, 2017 at 9:52 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Wed, Mar 22, 2017 at 9:28 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <neteler@osgeo.org> wrote:
…

Nikos, for an even bigger map try

Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
to 60° south):
http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
by USGS. 1.6GB in size.

Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.

After import into GRASS GIS, here the timings:

final map size:

g.region -p
…
rows: 493200
cols: 1296001
cells: 639187693200

(handling only works in GRASS GIS 7.3.svn since Markus Metz’s recent
improvements on global data import are needed).

(my changes were bug fixes, not improvements)

Benchmarks:

Import took 2h while reading the data from a CIFS mounted storage
box (slow) and writing on SSD.

Displaying the entire map (639 giga-pixel) in GRASS GIS’ display
(d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
since I am at a conference.

Fair deal I would say

A bit more information would help to compare:

what is your GDAL version?

GDAL 2.1.2

are 504 GeoTIFF files compressed? If yes, which method?

Yes, COMPRESSION=LZW

what are the block dimensions of the input GeoTIFFs?

Size is 36001, 36001 - Block=36001x1

This is row by row compression as in GRASS. That could help import with r.in.gdal which also reads and writes row by row.

Type=Byte

what kind of GRASS compression did you use?

Default raster + NULL compression enabled. I.e.,

r.compress -p watermask2010
is compressed (method 2: ZLIB). Data type: CELL

You might save disk space at the cost of longer reading times with BZIP2.

has a compressed NULL file

Again, the fact that I had to read from an attached storage box likely
slowed down the import.
Just thought to post these numbers here.

Impressive that such a large raster can be imported at all, and relatively fast!

Reading about 1.6 GB (also from an attached storage box) should not take 2 hours, therefore I think the limit is software input decompression and output compression.

Markus M