[GRASS-dev] dealing with nan

Hello,

I have some data that reads as "nan", while rare, it may happen when
compressing/uncompressing GRASS Locations to ship them across
Internet.

I would like to know how to deal with them in GRASS raster
programming, so that they are:
1 - detected
2 - set to null

thank you,
Yann

--
Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

Yann Chemin wrote:

I have some data that reads as "nan", while rare, it may happen when
compressing/uncompressing GRASS Locations to ship them across
Internet.

I would like to know how to deal with them in GRASS raster
programming, so that they are:
1 - detected
2 - set to null

  if (x != x)
    G_set_d_null_value(&x, 1);

There is also isnan(), which is in C99, and also specified by POSIX:

  http://www.opengroup.org/onlinepubs/009695399/functions/isnan.html

However, I don't know if it exists on all platforms which we care
about. MSDN says that MSVCRT has _isnan() (with a leading underscore),
but it's defined in <float.h> rather than <math.h>.

The (x != x) test should be portable; OTOH, it's the kind of thing
that compilers often get wrong, particularly when optimising (if you
ignore NaN, x!=x is always false).

For 7.0, I intend to change G_is_[fd]_null_value() to treat all NaN
values as null, not just the specific bit patterns which it currently
uses.

--
Glynn Clements <glynn@gclements.plus.com>

2008/5/19 Glynn Clements <glynn@gclements.plus.com>:

Yann Chemin wrote:

I have some data that reads as “nan”, while rare, it may happen when
compressing/uncompressing GRASS Locations to ship them across
Internet.

I would like to know how to deal with them in GRASS raster
programming, so that they are:
1 - detected
2 - set to null

if (x != x)
G_set_d_null_value(&x, 1);

Sounds simple enough.

There is also isnan(), which is in C99, and also specified by POSIX:

http://www.opengroup.org/onlinepubs/009695399/functions/isnan.html

However, I don’t know if it exists on all platforms which we care
about. MSDN says that MSVCRT has _isnan() (with a leading underscore),
but it’s defined in <float.h> rather than <math.h>.

hmm… ok

The (x != x) test should be portable; OTOH, it’s the kind of thing
that compilers often get wrong, particularly when optimising (if you
ignore NaN, x!=x is always false).

OK, this is a pickle… I’ll give it a try and see what it does here.

For 7.0, I intend to change G_is_[fd]_null_value() to treat all NaN
values as null, not just the specific bit patterns which it currently
uses.

This would be good indeed.


Glynn Clements <glynn@gclements.plus.com>


Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

Yann Chemin wrote:
> I have some data that reads as "nan", while
> rare, it may happen when compressing/uncompressing
> GRASS Locations to ship them across Internet.

what method are you using to compress/decompress?
..zip? .tar.gz? r.pack/r.unpack?

if r.pack I think it is safer to use the script in the trac wish patches which converts GRASS <=6 raster dir layout to proposed GRASS 7 $MAPSET/raster/$NAME/element) then tarball up the map's dir, and vice versa. I should probably update r.pack not to use r.{in|out}.mat as that will lose some metadata for sure.

places in the code to look for accidental creation of nan:
- (x/y) where both the x and y variables could sometimes be 0.
- tan(pi/2)
- r.in.mat/r.out.mat use Matlab's NaN to store grass NULLs
- GRASS nulls + non-core libgis functions meet mixed endian (?)
- ?

> I would like to know how to deal with them in GRASS
> raster programming, so that they are:
> 1 - detected
> 2 - set to null

Glynn:

  if (x != x)
    G_set_d_null_value(&x, 1);

There is also isnan(), which is in C99, and also specified
by POSIX:

  http://www.opengroup.org/onlinepubs/009695399/functions/isnan.html

However, I don't know if it exists on all platforms
which we care about. MSDN says that MSVCRT has _isnan() (with a
leading underscore), but it's defined in <float.h> rather than
<math.h>.

The (x != x) test should be portable; OTOH, it's the
kind of thing that compilers often get wrong, particularly when
optimising (if you ignore NaN, x!=x is always false).

there is a longstanding wish that 'r.null setnull=' could understand nan, so that you could get rid of them. e.g. very rarely r.in.xyz will create them, I am not sure why/how.

For 7.0, I intend to change G_is_[fd]_null_value() to treat
all NaN values as null, not just the specific bit patterns which
it currently uses.

the only reason I could see to keep them as nan not grass-NULL would be for debugging (there is obviously a bug if you get them..).

Hamish

2008/5/19 Hamish <hamish_b@yahoo.com>:

Yann Chemin wrote:

I have some data that reads as “nan”, while
rare, it may happen when compressing/uncompressing
GRASS Locations to ship them across Internet.

what method are you using to compress/decompress?
…zip? .tar.gz? r.pack/r.unpack?

Windows zip, ftp download across the World, then Linux unzip.

if r.pack I think it is safer to use the script in the trac wish patches which converts GRASS <=6 raster dir layout to proposed GRASS 7 $MAPSET/raster/$NAME/element) then tarball up the map’s dir, and vice versa. I should probably update r.pack not to use r.{in|out}.mat as that will lose some metadata for sure.

We will use r.[,un]pack that from now on.

places in the code to look for accidental creation of nan:

  • (x/y) where both the x and y variables could sometimes be 0.
  • tan(pi/2)
  • r.in.mat/r.out.mat use Matlab’s NaN to store grass NULLs
  • GRASS nulls + non-core libgis functions meet mixed endian (?)
  • ?

hmm, tan() is a suspect here. x==y==0 may also be, shall put tests but more complex.

I would like to know how to deal with them in GRASS
raster programming, so that they are:
1 - detected
2 - set to null

Glynn:

if (x != x)
G_set_d_null_value(&x, 1);

There is also isnan(), which is in C99, and also specified
by POSIX:

http://www.opengroup.org/onlinepubs/009695399/functions/isnan.html

However, I don’t know if it exists on all platforms
which we care about. MSDN says that MSVCRT has _isnan() (with a
leading underscore), but it’s defined in <float.h> rather than
<math.h>.

The (x != x) test should be portable; OTOH, it’s the
kind of thing that compilers often get wrong, particularly when
optimising (if you ignore NaN, x!=x is always false).

there is a longstanding wish that ‘r.null setnull=’ could understand nan, so that you could get rid of them. e.g. very rarely r.in.xyz will create them, I am not sure why/how.

would be useful to make coloring of output maps appear… right now nan intefers in the standard colors setup, it seems.

For 7.0, I intend to change G_is_[fd]_null_value() to treat
all NaN values as null, not just the specific bit patterns which
it currently uses.

the only reason I could see to keep them as nan not grass-NULL would be for debugging (there is obviously a bug if you get them…).

Hamish


Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

Hamish:

> what method are you using to compress/decompress?
> ..zip? .tar.gz? r.pack/r.unpack?

Yann Chemin wrote:

Windows zip, ftp download across the World, then Linux
unzip.

...

> if r.pack I think it is safer to use the script in the
> trac wish patches which converts GRASS <=6 raster dir layout to
> proposed GRASS 7 $MAPSET/raster/$NAME/element) then tarball up
> the map's dir, and vice versa.
> I should probably update r.pack not to use r.{in|out}.mat as that
> will lose some metadata for sure.

We will use r.[,un]pack that from now on.

I meant that r.pack as it is now may be lossy, and "r.convert" + "tar czf" may be a safer way:
  http://trac.osgeo.org/grass/ticket/84

(feel free to try, but be warned)

Hamish

2008/5/19 Hamish <hamish_b@yahoo.com>:

Hamish:

what method are you using to compress/decompress?
…zip? .tar.gz? r.pack/r.unpack?

Yann Chemin wrote:

Windows zip, ftp download across the World, then Linux
unzip.

if r.pack I think it is safer to use the script in the
trac wish patches which converts GRASS <=6 raster dir layout to
proposed GRASS 7 $MAPSET/raster/$NAME/element) then tarball up
the map’s dir, and vice versa.
I should probably update r.pack not to use r.{in|out}.mat as that
will lose some metadata for sure.

We will use r.[,un]pack that from now on.

I meant that r.pack as it is now may be lossy, and “r.convert” + “tar czf” may be a safer way:
http://trac.osgeo.org/grass/ticket/84

(feel free to try, but be warned)

Oh! Excuse my Frenglish, ok got the point.

Hamish


Yann Chemin
International Rice Research Institute
Office: http://www.irri.org/gis
Perso: http://www.freewebs.com/ychemin

Hamish wrote:

places in the code to look for accidental creation of nan:

- tan(pi/2)

That should be +/- Inf, not NaN. atan2(0,0) will return NaN, though.

- GRASS nulls + non-core libgis functions meet mixed endian (?)

GRASS' FP nulls are the all-ones bit patterns, which are unaffected by
endianness issues.

> The (x != x) test should be portable; OTOH, it's the
> kind of thing that compilers often get wrong, particularly when
> optimising (if you ignore NaN, x!=x is always false).

there is a longstanding wish that 'r.null setnull=' could understand
nan, so that you could get rid of them.

scanf("%f") etc understand "nan" (case not significant).

However, although "r.null ... setnull=nan" will result in a rule with
low==high==NaN, as NaN is neither less than, equal to, or greater than
itself, testing actual NaN values against that rule will always fail.

e.g. very rarely r.in.xyz will create them, I am not sure why/how.

Maybe you actually have "nan"s in the file?

> For 7.0, I intend to change G_is_[fd]_null_value() to treat
> all NaN values as null, not just the specific bit patterns which
> it currently uses.

the only reason I could see to keep them as nan not grass-NULL would
be for debugging (there is obviously a bug if you get them..).

I'm not proposing changing G_set_[df]_null_value(), only the test. If
the test is changed, there's no reason to explicitly convert "other"
NaN values to the GRASS value (which is just one possible NaN value;
any value with an all-ones exponent and a non-zero mantissa is NaN).

--
Glynn Clements <glynn@gclements.plus.com>