[GRASS-dev] r.out.gdal, again

Hi,

I have a wishlist for r.out.gdal together with a new version (link
below), and would like to put it up for discussion:
Changes in my version are:

Colortable export

A new flag -c is provided to export a colortable, default to no.
Colortables in GeoTIFF files are sometimes not properly rendered by
other GIS applications and the file might then be displayed all black,
even when the raster values are correct. The same GeoTIFF file without
colortable is displayed ok. Requested colortable export only works when
there is an entry in directories colr or colr2.

GRASS has no control over how a colortable is written to the exported
file, this is handled by GDAL and dependent on the file format. GRASS
uses intelligent color rules, whereas e.g. GeoTIFF needs a color table,
i.e. one entry for each potential, not actual data value.

Nodata value handling

If a nodata value was supplied, this value is tested whether it is
within the range of the selected data type and adjusted if necessary.
Specifying e.g. a nodata value of -9999 and using Byte as data type
(range 0 - 255) is no longer accepted, it will be adjusted to 0.
Similarly, a nodata value of 9999 would be adjusted to 255 for Byte
type. More generally, if a supplied nodata value is smaller than the
supported minimum, it is adjusted to the supported minimum, and if a
supplied nodata value is larger than the supported maximum, it is
adjusted to the supported maximum.

Current region extends aligned to raster resolution

The old r.out.gdal (GRASS-6.2.x) worked with the extends and resolution
of the raster to be exported. The new r.out.gdal C-module works with the
extends and resolution of the current region. My version works with the
extends of the current region and aligns them to the input raster map
resolution to avoid undesired implicit resampling if the resolution of
the current region does not match the input raster resolution. If a
raster map layer should be exported in a different resolution the raster
must now be resampled prior to exporting.

Raster band statistics

Raster band statistics (min, max, mean, stddev) are written to the
exported file if supported, GDAL decides. Since r.out.gdal uses current
region extends, these statistics are calculated for the current region
extends, replacing NULLs with the nodata value. I'm not sure if it would
be better to exclude NULLs for raster band statistics.

These statistics can be used by other GIS applications to get the actual
data range (an argument for inclusion of nodata values in raster stats),
but not many GIS applications actually make use of this information.
ESRI products e.g. initially report the potential data range, not the
real data range. This is usually not an issue with Byte type because of
its small range, but with e.g. Int32 or Float32, ESRI products often
initially display an all black image (not only GeoTIFF, also e.g. ESRI
.hdr labelled files). Adjusting the display mode (Symbology) helps.

Data type and range test

If the range of the raster to be exported is not covered by the selected
data type, r.out.gdal is aborted with an error message giving
information on the range of the raster to be exported as well as the
range of GDAL data types. The user can then easily compare the raster
data range with the ranges of GDAL data types and select an appropriate
data type.

Updated documentation

Documentation is updated, amongst other minor changes with ranges of
GDAL data types and recommended settings for best allround compatibility
of GeoTIFF files. The recommended settings are based on discussions in
GRASS mailing lists, GRASS ticket 73, GDAL documentation and own
testing. Apart from colortable export defaulting to no, these
recommended settings are not enforced or set as default.

Maciek looked into GeoTIFF optimisation, e.g. adding overviews, tiles
vs. strips, strip dimensions, internal compression methods. Since
GeoTIFF is so widely used, this information could also be included in
the r.out.gdal documentation.

This version of r.out.gdal was compiled with
grass-6.4.svn_src_snapshot_2008_08_16 and gdal-1.5.2.
Source code has been formatted using tools/grass-indent.sh

I tested with QGIS-0.9.1 on Linux 64bit, QGIS -0.11.0-1 on XP,
SAGA-2.0.3 on XP, gvSIG-1.1.1 (32bit binary) on Linux 64bit, gvSIG-1.1.1
on XP, PCI Geomatica FreeView 9.1 on XP, ER Viewer 7.2 (ER Mapper) on
XP, and ArcMap 9.2 (ESRI) on XP. It is not so easy to get hold of GIS
applications, also commercial, that do not use gdal...
Tests were performed with the elevation raster map layer in the North
Carolina sample dataset, creating a MASK for elevation values <= 70m,
needed for testing of nodata handling. The elevation raster was exported
as Byte, UInt16, Int32, and Float32, once with colortable (Byte and
UInt16 only, colortable export for other data types not supported by
gdal), once without colortable, always as GeoTIFF.

Test results:

QGIS-0.9.1 on Linux 64bit
display: all ok
colortable: all ok
cell values: all ok
nodata: all ok

QGIS-0.11.1 on XP
display: all ok
colortable: all ok
cell values: all ok
nodata: all ok

gvSIG-1.1.1 (Linux 32bit binary) on Linux 64bit, probably Java problems
display: all fail
colortable: all fail
cell values: all fail
nodata: all fail

gvSIG-1.1.1 on XP
display: all ok
colortable: fail for Uint16, ok for Byte
cell values: all ok
nodata: nodata not supported

PCI Geomatica FreeView 9.1 on XP
display: all ok
colortable: all ok
cell values: all ok
nodata: nodata not supported

ER Viewer (ER Mapper) on XP
display: fail for UInt16 with colortable, all other ok
colortable: fail for UInt16, ok for Byte
cell values: all ok
nodata: cell value query not supported

SAGA-2.0.3 on XP
display: all ok (colortables ignored)
colortable: all fail
cell values: all ok
nodata: all ok

ESRI ArcMap 9.2 on XP
display: all ok, but all without colortable and other than Byte need
manual adjustment which is painfully slow, even for these small raster
layers (Properties -> Symbology -> Stretch -> Minimum-Maximum)
colortable: all ok
cell values: all ok
nodata: all ok

Source code with README file is available at
http://markus.metz.giswork.googlepages.com/r.out.gdal.conservative.tar.gz

The diff is larger than the source code, therefore I did not attach a diff.

Please excuse this long mail, but I wanted to explain my wishlist and
had to provide test results so I could show what output is readable where.

Regards,

Markus

On Tuesday 19 August 2008, Markus Metz wrote:

Hi,

I have a wishlist for r.out.gdal together with a new version (link
below), and would like to put it up for discussion:
Changes in my version are:

Colortable export

A new flag -c is provided to export a colortable, default to no.
Colortables in GeoTIFF files are sometimes not properly rendered by
other GIS applications and the file might then be displayed all black,
even when the raster values are correct. The same GeoTIFF file without
colortable is displayed ok. Requested colortable export only works when
there is an entry in directories colr or colr2.

GRASS has no control over how a colortable is written to the exported
file, this is handled by GDAL and dependent on the file format. GRASS
uses intelligent color rules, whereas e.g. GeoTIFF needs a color table,
i.e. one entry for each potential, not actual data value.

Nodata value handling

If a nodata value was supplied, this value is tested whether it is
within the range of the selected data type and adjusted if necessary.
Specifying e.g. a nodata value of -9999 and using Byte as data type
(range 0 - 255) is no longer accepted, it will be adjusted to 0.
Similarly, a nodata value of 9999 would be adjusted to 255 for Byte
type. More generally, if a supplied nodata value is smaller than the
supported minimum, it is adjusted to the supported minimum, and if a
supplied nodata value is larger than the supported maximum, it is
adjusted to the supported maximum.

Current region extends aligned to raster resolution

The old r.out.gdal (GRASS-6.2.x) worked with the extends and resolution
of the raster to be exported. The new r.out.gdal C-module works with the
extends and resolution of the current region. My version works with the
extends of the current region and aligns them to the input raster map
resolution to avoid undesired implicit resampling if the resolution of
the current region does not match the input raster resolution. If a
raster map layer should be exported in a different resolution the raster
must now be resampled prior to exporting.

Raster band statistics

Raster band statistics (min, max, mean, stddev) are written to the
exported file if supported, GDAL decides. Since r.out.gdal uses current
region extends, these statistics are calculated for the current region
extends, replacing NULLs with the nodata value. I'm not sure if it would
be better to exclude NULLs for raster band statistics.

These statistics can be used by other GIS applications to get the actual
data range (an argument for inclusion of nodata values in raster stats),
but not many GIS applications actually make use of this information.
ESRI products e.g. initially report the potential data range, not the
real data range. This is usually not an issue with Byte type because of
its small range, but with e.g. Int32 or Float32, ESRI products often
initially display an all black image (not only GeoTIFF, also e.g. ESRI
.hdr labelled files). Adjusting the display mode (Symbology) helps.

Data type and range test

If the range of the raster to be exported is not covered by the selected
data type, r.out.gdal is aborted with an error message giving
information on the range of the raster to be exported as well as the
range of GDAL data types. The user can then easily compare the raster
data range with the ranges of GDAL data types and select an appropriate
data type.

Updated documentation

Documentation is updated, amongst other minor changes with ranges of
GDAL data types and recommended settings for best allround compatibility
of GeoTIFF files. The recommended settings are based on discussions in
GRASS mailing lists, GRASS ticket 73, GDAL documentation and own
testing. Apart from colortable export defaulting to no, these
recommended settings are not enforced or set as default.

Maciek looked into GeoTIFF optimisation, e.g. adding overviews, tiles
vs. strips, strip dimensions, internal compression methods. Since
GeoTIFF is so widely used, this information could also be included in
the r.out.gdal documentation.

This version of r.out.gdal was compiled with
grass-6.4.svn_src_snapshot_2008_08_16 and gdal-1.5.2.
Source code has been formatted using tools/grass-indent.sh

I tested with QGIS-0.9.1 on Linux 64bit, QGIS -0.11.0-1 on XP,
SAGA-2.0.3 on XP, gvSIG-1.1.1 (32bit binary) on Linux 64bit, gvSIG-1.1.1
on XP, PCI Geomatica FreeView 9.1 on XP, ER Viewer 7.2 (ER Mapper) on
XP, and ArcMap 9.2 (ESRI) on XP. It is not so easy to get hold of GIS
applications, also commercial, that do not use gdal...
Tests were performed with the elevation raster map layer in the North
Carolina sample dataset, creating a MASK for elevation values <= 70m,
needed for testing of nodata handling. The elevation raster was exported
as Byte, UInt16, Int32, and Float32, once with colortable (Byte and
UInt16 only, colortable export for other data types not supported by
gdal), once without colortable, always as GeoTIFF.

Test results:

QGIS-0.9.1 on Linux 64bit
display: all ok
colortable: all ok
cell values: all ok
nodata: all ok

QGIS-0.11.1 on XP
display: all ok
colortable: all ok
cell values: all ok
nodata: all ok

gvSIG-1.1.1 (Linux 32bit binary) on Linux 64bit, probably Java problems
display: all fail
colortable: all fail
cell values: all fail
nodata: all fail

gvSIG-1.1.1 on XP
display: all ok
colortable: fail for Uint16, ok for Byte
cell values: all ok
nodata: nodata not supported

PCI Geomatica FreeView 9.1 on XP
display: all ok
colortable: all ok
cell values: all ok
nodata: nodata not supported

ER Viewer (ER Mapper) on XP
display: fail for UInt16 with colortable, all other ok
colortable: fail for UInt16, ok for Byte
cell values: all ok
nodata: cell value query not supported

SAGA-2.0.3 on XP
display: all ok (colortables ignored)
colortable: all fail
cell values: all ok
nodata: all ok

ESRI ArcMap 9.2 on XP
display: all ok, but all without colortable and other than Byte need
manual adjustment which is painfully slow, even for these small raster
layers (Properties -> Symbology -> Stretch -> Minimum-Maximum)
colortable: all ok
cell values: all ok
nodata: all ok

Source code with README file is available at
http://markus.metz.giswork.googlepages.com/r.out.gdal.conservative.tar.gz

The diff is larger than the source code, therefore I did not attach a diff.

Please excuse this long mail, but I wanted to explain my wishlist and
had to provide test results so I could show what output is readable where.

Regards,

Markus

Nice work. I briefly tested this with some of the data in the spearfish
location, and it seemed to work. Have you appended this information to the
original ticket related to r.out.gdal not functioning correctly?

Great job, looking forward to a functioning r.out.gdal (finally!).

Cheers,

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Markus Metz wrote:

I have a wishlist for r.out.gdal together with a new version (link
below), and would like to put it up for discussion:
Changes in my version are:

Colortable export

A new flag -c is provided to export a colortable, default to no.
Colortables in GeoTIFF files are sometimes not properly rendered by
other GIS applications and the file might then be displayed all black,
even when the raster values are correct. The same GeoTIFF file without
colortable is displayed ok. Requested colortable export only works when
there is an entry in directories colr or colr2.

I would suggest inverting the sense of the flag, so that colour tables
are created by default, but can be suppressed by the flag.

Most of the time, colour tables work. It seems to be an issue
primarily with 16-bit TIFFs.

Current region extends aligned to raster resolution

The old r.out.gdal (GRASS-6.2.x) worked with the extends and resolution
of the raster to be exported. The new r.out.gdal C-module works with the
extends and resolution of the current region. My version works with the
extends of the current region and aligns them to the input raster map
resolution to avoid undesired implicit resampling if the resolution of
the current region does not match the input raster resolution. If a
raster map layer should be exported in a different resolution the raster
must now be resampled prior to exporting.

Make this a flag; the default should be to honour the current region.

--
Glynn Clements <glynn@gclements.plus.com>

Markus Metz pisze:

Nodata value handling

If a nodata value was supplied, this value is tested whether it is
within the range of the selected data type and adjusted if necessary.
Specifying e.g. a nodata value of -9999 and using Byte as data type
(range 0 - 255) is no longer accepted, it will be adjusted to 0.
Similarly, a nodata value of 9999 would be adjusted to 255 for Byte
type.

IMHO r.out.gdal should abort with an error rather than silently adjust
user-specified values.

Maciek

--
Maciej Sieczka
www.sieczka.org

Maciej Sieczka wrote:

Markus Metz pisze:

Nodata value handling

If a nodata value was supplied, this value is tested whether it is
within the range of the selected data type and adjusted if necessary.
Specifying e.g. a nodata value of -9999 and using Byte as data type
(range 0 - 255) is no longer accepted, it will be adjusted to 0.
Similarly, a nodata value of 9999 would be adjusted to 255 for Byte
type.

IMHO r.out.gdal should abort with an error rather than silently adjust
user-specified values.

Aborting with an error is definitively an option, but then the original
behaviour of r.out.gdal should be changed too. The original r.out.gdal
chooses a nodata value according to the selected output data type if
none was specified, without message or warning. Later on (original
nodata value handling), a warning message is issued if there are NULLs
in the raster to be exported, irrespective of whether a nodata value was
specified by the user or not.
With my version, the nodata value is not silently adjusted, a warning
message is issued. I wanted to keep the behaviour similar to the
original version, and if everything is all right, there is no difference
apart from the additional -c flag and a warning if colortable export
takes place.
r.out.gdal nodata handling could be changed so that the user has to
provide a nodata value if there are NULLs in the raster, otherwise
r.out.gdal aborts with an error. Further on, r.out.gdal can also abort
with an error if the specified nodata value is not covered by the range
of the selected data type, instead of adjusting the nodata value and
issuing a warning message.
IMHO, either a warning or an error message should be issued and the type
of message - warning or error - should be consistent for nodata handling
in the whole module. If the general policy is not to adjust
user-specified values, even with a warning message, but to abort with an
error and provide information on what's wrong and how to fix it, then it
should be changed to aborting with an error. In any case, this is easy
to change.

Markus

Markus Metz wrote:

r.out.gdal nodata handling could be changed so that the user has to
provide a nodata value if there are NULLs in the raster, otherwise
r.out.gdal aborts with an error.

That's reasonable for integer maps, but for FP maps it makes more
sense to simply default to NaN as the no-data value (the two are
essentially the same concept, so there is no information loss in using
NaN).

--
Glynn Clements <glynn@gclements.plus.com>

On Wednesday 20 August 2008, Glynn Clements wrote:

Markus Metz wrote:
> r.out.gdal nodata handling could be changed so that the user has to
> provide a nodata value if there are NULLs in the raster, otherwise
> r.out.gdal aborts with an error.

That's reasonable for integer maps, but for FP maps it makes more
sense to simply default to NaN as the no-data value (the two are
essentially the same concept, so there is no information loss in using
NaN).

Do other software packages understand 'Nan' ?

Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

Dylan Beaudette wrote:

On Wednesday 20 August 2008, Glynn Clements wrote:
  

Markus Metz wrote:
    

r.out.gdal nodata handling could be changed so that the user has to
provide a nodata value if there are NULLs in the raster, otherwise
r.out.gdal aborts with an error.
      

That's reasonable for integer maps, but for FP maps it makes more
sense to simply default to NaN as the no-data value (the two are
essentially the same concept, so there is no information loss in using
NaN).
    
Do other software packages understand 'Nan' ?
  

Do other file formats understand Nan? Reading a bit in GDAL API
documentation lets me suspect that NaN is not allowed, because for
GeoTIFF, you can specify a NoData value as in the current implementation
or you can add an internal mask, no mentioning of writing NaN for
Float32 or Float64 to a GDAL raster band. Did I overlook something?

Markus

Markus Metz wrote:

>>> r.out.gdal nodata handling could be changed so that the user has to
>>> provide a nodata value if there are NULLs in the raster, otherwise
>>> r.out.gdal aborts with an error.
>>>
>> That's reasonable for integer maps, but for FP maps it makes more
>> sense to simply default to NaN as the no-data value (the two are
>> essentially the same concept, so there is no information loss in using
>> NaN).
>>
>
> Do other software packages understand 'Nan' ?

Do other file formats understand Nan? Reading a bit in GDAL API
documentation lets me suspect that NaN is not allowed, because for
GeoTIFF, you can specify a NoData value as in the current implementation
or you can add an internal mask, no mentioning of writing NaN for
Float32 or Float64 to a GDAL raster band. Did I overlook something?

Any code which deals with floating-point values would have to go out
of its way to avoid NaNs, i.e. actively check for them and flag an
error. Likewise for infinities.

If code just passes floating-point values around without explicitly
classifying them, most of the time it will get the correct behaviour
without even trying.

I suspect that the most likely reason for code to handle NaN
incorrectly would be equality checks, i.e. "if (x == null_val)" won't
work if null_val is NaN (it will always be false, even if x is NaN).
OTOH, floating-point equality checks have plenty of other problems
(e.g. x86 using 64 bits for a double stored in memory but 80 bits for
a register, meaning that "x = y; if (x == y) ..." won't necessarily
work for non-NaN values either).

--
Glynn Clements <glynn@gclements.plus.com>