[GRASS5] Re: [GRASS_DE] r.mapcalc round()

Hi Dieter,

(cc to grass5)

On Tue, May 15, 2001 at 03:01:36PM +0200, Dieter Lehmann wrote:

Hi,

r.mapcalc test='float(x)/10'
results in something like 0.0200000002.

what are the contents of "x"?

I need to round. The function round() in mapcalc gives only the next integer.

My problem might be solved in mapcalc or in r.stats (I need to export the
values through r.stats -1 test).

However, something is odd with r.mapcalc in my opinion:

r.mapcalc x1=1.4
EXECUTING x1 = ... 100%
CREATING SUPPORT FILES FOR x1
range: 1.3999999762 1.3999999762

r.stats x1
r.stats: 100%
1.4-1.4

but:
r.stats -1 x1
1.3999999762
1.3999999762
1.3999999762
1.3999999762
1.3999999762
[...]

-> not what I expect.

r.univar x1
[...]
Minimum: 1.3999999762
Maximum: 1.3999999762
Range: 0
Arithmetic Mean: 1.4
Variance: -2.49502e-12
awk: cmd. line:17: (FILENAME=- FNR=18894) warning: sqrt called with negative argument -2.49502e-12
Standarddeviation: nan

-> huh!

r.mapcalc x2=1.6
EXECUTING x2 = ... 100%
CREATING SUPPORT FILES FOR x2
range: 1.6000000238 1.6000000238

-> does not look very impressive.

r.mapcalc x0=1
EXECUTING x0 = ... 100%
CREATING SUPPORT FILES FOR x0
range: 1 1

-> o.k.

Starting to round:
r.mapcalc x1round="round(x1)"
EXECUTING x1round = ... 100%
CREATING SUPPORT FILES FOR x1round
range: 1 1
-> rounding is o.k.

r.mapcalc x2round="round(x2)"
EXECUTING x2round = ... 100%
CREATING SUPPORT FILES FOR x2round
range: 2 2
-> rounding is o.k.

r.mapcalc x3="1.6 + 1."
EXECUTING x3 = ... 100%
CREATING SUPPORT FILES FOR x3
range: 2.5999999046 2.5999999046
-> huh?

Perhaps someone could explain this behaviour. Or is it a precision bug?

Markus

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus Neteler wrote:

However, something is odd with r.mapcalc in my opinion:

r.mapcalc x1=1.4
EXECUTING x1 = ... 100%
CREATING SUPPORT FILES FOR x1
range: 1.3999999762 1.3999999762

r.stats x1
r.stats: 100%
1.4-1.4

but:
r.stats -1 x1
1.3999999762
1.3999999762
1.3999999762
1.3999999762
1.3999999762
[...]

-> not what I expect.

Note that 1/10 isn't exactly representable in binary, in the same way
that 1/3 isn't exactly representable in decimal. The above seems to
match the accuracy of the "float" data type.

r.univar x1
[...]
Minimum: 1.3999999762
Maximum: 1.3999999762
Range: 0
Arithmetic Mean: 1.4
Variance: -2.49502e-12
awk: cmd. line:17: (FILENAME=- FNR=18894) warning: sqrt called with negative argument -2.49502e-12
Standarddeviation: nan

-> huh!

This looks like one of those cases where one operand of a subtraction
has IEEE float (32-bit) or double (64-bit) precision while the other
has the FPU's internal precision (80-bit for x86).

Does this go away if you compile r.univar with "-ffloat-store"?

Perhaps someone could explain this behaviour. Or is it a precision bug?

It looks like it.

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

On Tue, May 15, 2001 at 03:22:11PM +0100, Glynn Clements wrote:

Markus Neteler wrote:
> However, something is odd with r.mapcalc in my opinion:
>
> r.mapcalc x1=1.4
> EXECUTING x1 = ... 100%
> CREATING SUPPORT FILES FOR x1
> range: 1.3999999762 1.3999999762
>
> r.stats x1
> r.stats: 100%
> 1.4-1.4
>
> but:
> r.stats -1 x1
> 1.3999999762
> 1.3999999762
> 1.3999999762
> 1.3999999762
> 1.3999999762
> [...]
>
> -> not what I expect.

Note that 1/10 isn't exactly representable in binary, in the same way
that 1/3 isn't exactly representable in decimal. The above seems to
match the accuracy of the "float" data type.

> r.univar x1
> [...]
> Minimum: 1.3999999762
> Maximum: 1.3999999762
> Range: 0
> Arithmetic Mean: 1.4
> Variance: -2.49502e-12
> awk: cmd. line:17: (FILENAME=- FNR=18894) warning: sqrt called with negative argument -2.49502e-12
> Standarddeviation: nan
>
> -> huh!

This looks like one of those cases where one operand of a subtraction
has IEEE float (32-bit) or double (64-bit) precision while the other
has the FPU's internal precision (80-bit for x86).

Aha, I see.

Does this go away if you compile r.univar with "-ffloat-store"?

Well, r.univar is only a script using r.stats and awk. However, after
recompiling r.stats with above flag (adding
EXTRA_CFLAGS=-ffloat-store
to the r.stats' Gmakefile) I can't see any difference.

> Perhaps someone could explain this behaviour. Or is it a precision bug?

It looks like it.

If I shall try anything else...

Markus

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus Neteler wrote:

> Does this go away if you compile r.univar with "-ffloat-store"?

Well, r.univar is only a script using r.stats and awk. However, after
recompiling r.stats with above flag (adding
EXTRA_CFLAGS=-ffloat-store
to the r.stats' Gmakefile) I can't see any difference.

That wouldn't help; it's the awk script which is doing the relevant
calculations. Looking at the awk script, I don't think that compiling
awk with -ffloat-store would help.

The problem is basically that none of the axioms of real arithmetic
quite hold true for floating-point arithmetic.

In this particular case, the variance is:

  (SUM[x(i)^2] - SUM[x(i)]^2/N)/N

If all of the x(i)'s are identical, then the two sides of the
subtraction should be equal. However, given the error introduced by
each floating point operation, they will differ slightly; if the RHS
is slightly larger, then result will be negative, causing the sqrt()
to fail.

Whilst there probably is an algorithmic solution (a decent textbook on
numerical methods should list many approaches for working around the
problems inherent in floating-point subtraction), I'm not sure whether
that would be justified, or if it would be better to just include an
explicit check for a negative variance (indicating a variance so low
that it's been swamped by rounding error).

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

On Tue, May 15, 2001 at 07:51:51PM +0100, Glynn Clements wrote:

Markus Neteler wrote:
> > Does this go away if you compile r.univar with "-ffloat-store"?
>
> Well, r.univar is only a script using r.stats and awk. However, after
> recompiling r.stats with above flag (adding
> EXTRA_CFLAGS=-ffloat-store
> to the r.stats' Gmakefile) I can't see any difference.

That wouldn't help; it's the awk script which is doing the relevant
calculations. Looking at the awk script, I don't think that compiling
awk with -ffloat-store would help.

The problem is basically that none of the axioms of real arithmetic
quite hold true for floating-point arithmetic.

In this particular case, the variance is:

  (SUM[x(i)^2] - SUM[x(i)]^2/N)/N

If all of the x(i)'s are identical, then the two sides of the
subtraction should be equal. However, given the error introduced by
each floating point operation, they will differ slightly; if the RHS
is slightly larger, then result will be negative, causing the sqrt()
to fail.

Whilst there probably is an algorithmic solution (a decent textbook on
numerical methods should list many approaches for working around the
problems inherent in floating-point subtraction), I'm not sure whether
that would be justified, or if it would be better to just include an
explicit check for a negative variance (indicating a variance so low
that it's been swamped by rounding error).

Perhaps I am wrong, but doesn't the problems in r.univar result from
the input data? If r.mapcalc produces wrong numbers, the r.univar shouldn't
be better. Here, the r.univar replicates the values generated from
r.mapcalc. So the problem may be in r.mapcalc.

The range results of r.mapcalc are quite strange with FP numbers.
Even if I compile r.mapcalc with -ffloat-store the result is odd:

r.mapcalc test=1.1
CREATING SUPPORT FILES FOR test
range: 1.1000000238 1.1000000238

Regards

Markus

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus,

as far as I can tell there is nothing wrong with r.mapcalc and Glynn is right
that this is a floating point precission issue. Everybody who does some
numerical computations has to deal with that - Lubos keeps a small constant
in his programs and when subtracting two FP numbers which could be
similar he compares the difference against that constant.
We had to deal with that in our grass programs too.

You can let r.mapcalc use only six digits when reporting the range (as
r.describe does)
so people won't get confused by getting a slightly different number,
on the other hand it is good to know what
the computer is really working with, so I would keep it there.

I think that you need to take care of the issue in r.univar (as suggested by
Glynn).
If you want to know more,Lubos can explain it better than me, I can ask him to
write you.
We should put something about this into the GRASS handbook too.

BTW ArcView 3 spatial analyst map calculator quietly cuts-off some floating
point numbers - see
http://www2.gis.uiuc.edu:2280/modviz/erosion/usped.html
with some serious consequences - we don't want that in GRASS. (they have fixed
it for ArcGIS8)

Helena

Glynn Clements wrote:

Markus Neteler wrote:

> > Does this go away if you compile r.univar with "-ffloat-store"?
>
> Well, r.univar is only a script using r.stats and awk. However, after
> recompiling r.stats with above flag (adding
> EXTRA_CFLAGS=-ffloat-store
> to the r.stats' Gmakefile) I can't see any difference.

That wouldn't help; it's the awk script which is doing the relevant
calculations. Looking at the awk script, I don't think that compiling
awk with -ffloat-store would help.

The problem is basically that none of the axioms of real arithmetic
quite hold true for floating-point arithmetic.

In this particular case, the variance is:

        (SUM[x(i)^2] - SUM[x(i)]^2/N)/N

If all of the x(i)'s are identical, then the two sides of the
subtraction should be equal. However, given the error introduced by
each floating point operation, they will differ slightly; if the RHS
is slightly larger, then result will be negative, causing the sqrt()
to fail.

Whilst there probably is an algorithmic solution (a decent textbook on
numerical methods should list many approaches for working around the
problems inherent in floating-point subtraction), I'm not sure whether
that would be justified, or if it would be better to just include an
explicit check for a negative variance (indicating a variance so low
that it's been swamped by rounding error).

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus Neteler wrote:

> > > Does this go away if you compile r.univar with "-ffloat-store"?
> >
> > Well, r.univar is only a script using r.stats and awk. However, after
> > recompiling r.stats with above flag (adding
> > EXTRA_CFLAGS=-ffloat-store
> > to the r.stats' Gmakefile) I can't see any difference.
>
> That wouldn't help; it's the awk script which is doing the relevant
> calculations. Looking at the awk script, I don't think that compiling
> awk with -ffloat-store would help.
>
> The problem is basically that none of the axioms of real arithmetic
> quite hold true for floating-point arithmetic.
>
> In this particular case, the variance is:
>
> (SUM[x(i)^2] - SUM[x(i)]^2/N)/N
>
> If all of the x(i)'s are identical, then the two sides of the
> subtraction should be equal. However, given the error introduced by
> each floating point operation, they will differ slightly; if the RHS
> is slightly larger, then result will be negative, causing the sqrt()
> to fail.
>
> Whilst there probably is an algorithmic solution (a decent textbook on
> numerical methods should list many approaches for working around the
> problems inherent in floating-point subtraction), I'm not sure whether
> that would be justified, or if it would be better to just include an
> explicit check for a negative variance (indicating a variance so low
> that it's been swamped by rounding error).

Perhaps I am wrong, but doesn't the problems in r.univar result from
the input data?

No. Or at least this problem doesn't relate to wrong input data:

Variance: -2.49502e-12
awk: cmd. line:17: (FILENAME=- FNR=18894) warning: sqrt called with negative argument -2.49502e-12
Standarddeviation: nan

If all of the input values are identical (even if they're wrong), the
variance and SD should be zero. Also, whatever the inputs, the
variance should never be negative.

If r.mapcalc produces wrong numbers, the r.univar shouldn't
be better. Here, the r.univar replicates the values generated from
r.mapcalc. So the problem may be in r.mapcalc.

The variance/SD problem is in one of the following, depending upon
your point of view:

a) the awk script "r.univar",
b) awk generally, or
c) floating-point arithmetic generally.

The range results of r.mapcalc are quite strange with FP numbers.

Not that strange; with the "-1" flag, r.stats uses "%.10f" to print
the values; without it, it uses "%10f" then trims the result (this is
equivalent to "%.6f", at least for GNU libc).

Even if I compile r.mapcalc with -ffloat-store the result is odd:

"-ffloat-store" was basically a hunch regarding the variance/SD issue,
which turned out to be incorrect. It will fix some FP problems
(specifically, when subtracting identical values doesn't give zero);
it won't fix all of them (in this case, subtracting values which
should be identical but aren't quite).

r.mapcalc test=1.1
CREATING SUPPORT FILES FOR test
range: 1.1000000238 1.1000000238

This is to be expected; r.mapcalc also uses "%.10f", which is more
precision than a float has (float will be promoted to double when
passed as an unprototyped argument, as is the case with printf()).

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

On Wed, May 16, 2001 at 04:36:16AM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> > > > Does this go away if you compile r.univar with "-ffloat-store"?
> > >
> > > Well, r.univar is only a script using r.stats and awk. However, after
> > > recompiling r.stats with above flag (adding
> > > EXTRA_CFLAGS=-ffloat-store
> > > to the r.stats' Gmakefile) I can't see any difference.
> >
> > That wouldn't help; it's the awk script which is doing the relevant
> > calculations. Looking at the awk script, I don't think that compiling
> > awk with -ffloat-store would help.
> >
> > The problem is basically that none of the axioms of real arithmetic
> > quite hold true for floating-point arithmetic.
> >
> > In this particular case, the variance is:
> >
> > (SUM[x(i)^2] - SUM[x(i)]^2/N)/N
> >
> > If all of the x(i)'s are identical, then the two sides of the
> > subtraction should be equal. However, given the error introduced by
> > each floating point operation, they will differ slightly; if the RHS
> > is slightly larger, then result will be negative, causing the sqrt()
> > to fail.
> >
> > Whilst there probably is an algorithmic solution (a decent textbook on
> > numerical methods should list many approaches for working around the
> > problems inherent in floating-point subtraction), I'm not sure whether
> > that would be justified, or if it would be better to just include an
> > explicit check for a negative variance (indicating a variance so low
> > that it's been swamped by rounding error).
>
> Perhaps I am wrong, but doesn't the problems in r.univar result from
> the input data?

No. Or at least this problem doesn't relate to wrong input data:

> Variance: -2.49502e-12
> awk: cmd. line:17: (FILENAME=- FNR=18894) warning: sqrt called with negative argument -2.49502e-12
> Standarddeviation: nan

If all of the input values are identical (even if they're wrong), the
variance and SD should be zero. Also, whatever the inputs, the
variance should never be negative.

Glynn, I agree. Was a little tired yesterday.

> If r.mapcalc produces wrong numbers, the r.univar shouldn't
> be better. Here, the r.univar replicates the values generated from
> r.mapcalc. So the problem may be in r.mapcalc.

The variance/SD problem is in one of the following, depending upon
your point of view:

a) the awk script "r.univar",

BTW: A C implementation would be better (maybe using s.univar). Would
be a nice exercise, but I don't have the time.

b) awk generally, or
c) floating-point arithmetic generally.

> The range results of r.mapcalc are quite strange with FP numbers.

Not that strange; with the "-1" flag, r.stats uses "%.10f" to print
the values; without it, it uses "%10f" then trims the result (this is
equivalent to "%.6f", at least for GNU libc).

> Even if I compile r.mapcalc with -ffloat-store the result is odd:

"-ffloat-store" was basically a hunch regarding the variance/SD issue,
which turned out to be incorrect. It will fix some FP problems
(specifically, when subtracting identical values doesn't give zero);
it won't fix all of them (in this case, subtracting values which
should be identical but aren't quite).

> r.mapcalc test=1.1
> CREATING SUPPORT FILES FOR test
> range: 1.1000000238 1.1000000238

This is to be expected; r.mapcalc also uses "%.10f", which is more
precision than a float has (float will be promoted to double when
passed as an unprototyped argument, as is the case with printf()).

O.k. I think I understand the problem now. But how to deal with it?
r.mapcalc seems to generate FCELL only. Shouldn't the range printed with
float precision then? Sorry for insisting here, but above range will
be expected to be wrong by the average user not familiar with precision
issues.

And r.stats: maybe the printed precision should be dependent from the
input map type, means more decimals for DCELL maps that FCELL maps.

I am no expert here, this is just my impression,

regards

Markus

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

O.k. I think I understand the problem now. But how to deal with it?
r.mapcalc seems to generate FCELL only. Shouldn't the range printed with
float precision then? Sorry for insisting here, but above range will
be expected to be wrong by the average user not familiar with precision
issues.

Yes, as I have written in the previous message, you can print it with fewer digits -
r.describe does that (see example below). I think that the printing with larger number
of digits was put there because user defined precision was considered (in some
modules you can choose between FCELL and DCELL). there are other modules which
print unreasonably large number of digits (e.g. r.colors with rules options). It may be
also a left-over from the debugging when the FP support was developed and the need
for double precision versus single prec. was discussed

Helena

GRASS:~/lresults/lhood/lhouse > r.mapcalc

mapcalc> test=1.1
test - already exists. ok to overwrite? (y/n) y

EXECUTING test = ... 100%
CREATING SUPPORT FILES FOR test
range: 1.1000000238 1.1000000238

mapcalc> exit
GRASS:~/lresults/lhood/lhouse > r.describe test
READING [test in helahood] ... 100%
1.100000-1.100000
GRASS:~/lresults/lhood/lhouse >

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus Neteler wrote:

> a) the awk script "r.univar",

BTW: A C implementation would be better (maybe using s.univar). Would
be a nice exercise, but I don't have the time.

Simply re-implementing it in C should be trivial; however, that won't
solve the problems when the variance is swamped by the rounding error.

Unless someone is planning on implementing a more robust variance
algorithm, we should at least add a check for negative variance in
r.univar. BTW, does s.univar suffer from the same problem?

> > r.mapcalc test=1.1
> > CREATING SUPPORT FILES FOR test
> > range: 1.1000000238 1.1000000238
>
> This is to be expected; r.mapcalc also uses "%.10f", which is more
> precision than a float has (float will be promoted to double when
> passed as an unprototyped argument, as is the case with printf()).

O.k. I think I understand the problem now. But how to deal with it?

Just use "%f" or "%.6f"

r.mapcalc seems to generate FCELL only. Shouldn't the range printed with
float precision then?

Yes. float has just under 7 digits of precision (FLT_EPSILON is around
1.19e-7 for IEEE single precision), so "%.6f" should do it. I suspect
that "%.10f" was "pulled out of a hat".

Sorry for insisting here, but above range will
be expected to be wrong by the average user not familiar with precision
issues.

And r.stats: maybe the printed precision should be dependent from the
input map type, means more decimals for DCELL maps that FCELL maps.

Maybe; although this won't help if a DCELL map is generated using
float data somewhere in the process.

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

On Wed, May 16, 2001 at 03:18:40PM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> > a) the awk script "r.univar",
>
> BTW: A C implementation would be better (maybe using s.univar). Would
> be a nice exercise, but I don't have the time.

Simply re-implementing it in C should be trivial; however, that won't
solve the problems when the variance is swamped by the rounding error.

Unless someone is planning on implementing a more robust variance
algorithm, we should at least add a check for negative variance in
r.univar. BTW, does s.univar suffer from the same problem?

I have tested this:
r.mapcalc plane=1.1
range: 1.1000000238 1.1000000238 # yes, we could use %f
r.to.sites -a in=plane out=plane # without "-a" you get #1.1 in sites
                                      list which is forbidden by definition

head $LOCATION/site_lists/plane
name|plane
desc|r.to.sites -a input=plane output=plane label=No Label
form|||%
labels|Easting|Northing|%No Label
3570506.25|5764093.75|%1.10000002
3570518.75|5764093.75|%1.10000002
3570531.25|5764093.75|%1.10000002
3570543.75|5764093.75|%1.10000002

-> precision problem as well

s.univar plane
        number of points 53760
                    mean 1.1
      standard deviation 1.45885e-13
coefficient of variation 1.32622e-11
                skewness -1
                kurtosis -2
         mean of squares 1.21
mean of absolute values 1.1
                 minimum 1.1
          first quartile 1.1
                  median 1.1
          third quartile 1.1
                 maximum 1.1

-> far from being perfect

Unfortunately the precision problems seem to be spreaded in GRASS.

> > > r.mapcalc test=1.1
> > > CREATING SUPPORT FILES FOR test
> > > range: 1.1000000238 1.1000000238
> >
> > This is to be expected; r.mapcalc also uses "%.10f", which is more
> > precision than a float has (float will be promoted to double when
> > passed as an unprototyped argument, as is the case with printf()).
>
> O.k. I think I understand the problem now. But how to deal with it?

Just use "%f" or "%.6f"

> r.mapcalc seems to generate FCELL only. Shouldn't the range printed with
> float precision then?

Yes. float has just under 7 digits of precision (FLT_EPSILON is around
1.19e-7 for IEEE single precision), so "%.6f" should do it. I suspect
that "%.10f" was "pulled out of a hat".

O.k. I have changed that to "%.6f" in CVS.
Now I get:

r.mapcalc plane=1.1
range: 1.1 1.1

which is looking better.

and:
r.mapcalc plane=1.123456789
range: 1.123457 1.123457

> Sorry for insisting here, but above range will
> be expected to be wrong by the average user not familiar with precision
> issues.
>
> And r.stats: maybe the printed precision should be dependent from the
> input map type, means more decimals for DCELL maps that FCELL maps.

Maybe; although this won't help if a DCELL map is generated using
float data somewhere in the process.

O.k., but we should address the problem when displaying a FCELL map.
In above example I get:

r.stats -1 plane
1.1234568357
1.1234568357
1.1234568357

instead of
1.1234568
1.1234568
1.1234568

Could the RASTER_MAP_TYPE be used somehow for the "fprintf()"s?

Regards

Markus

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus Neteler wrote:

> Unless someone is planning on implementing a more robust variance
> algorithm, we should at least add a check for negative variance in
> r.univar. BTW, does s.univar suffer from the same problem?

I have tested this:

[snip]

      standard deviation 1.45885e-13
coefficient of variation 1.32622e-11

[snip]

-> far from being perfect

At least it didn't fail with a domain error, although that may well
just be luck. Not having looked at the code, I can imagine that using
a different input value might result in a variance slightly less than
zero instead of one slightly greater.

> > r.mapcalc seems to generate FCELL only. Shouldn't the range printed with
> > float precision then?
>
> Yes. float has just under 7 digits of precision (FLT_EPSILON is around
> 1.19e-7 for IEEE single precision), so "%.6f" should do it. I suspect
> that "%.10f" was "pulled out of a hat".

O.k. I have changed that to "%.6f" in CVS.
Now I get:

r.mapcalc plane=1.1
range: 1.1 1.1

which is looking better.

and:
r.mapcalc plane=1.123456789
range: 1.123457 1.123457

Note, however, that a similar approach probably should NOT be adopted
for writing data to files (e.g. sites files). Whilst using less
precision than a float has may improve the appearance (and conserve
space), it is likely to introduce further error.

Furthermore, "%f" (with or without width and/or precision modifiers)
should never be used for data which is intended to be read by a
program; use "%e" instead.

> > Sorry for insisting here, but above range will
> > be expected to be wrong by the average user not familiar with precision
> > issues.
> >
> > And r.stats: maybe the printed precision should be dependent from the
> > input map type, means more decimals for DCELL maps that FCELL maps.
>
> Maybe; although this won't help if a DCELL map is generated using
> float data somewhere in the process.

O.k., but we should address the problem when displaying a FCELL map.
In above example I get:

r.stats -1 plane
1.1234568357
1.1234568357
1.1234568357

instead of
1.1234568
1.1234568
1.1234568

Could the RASTER_MAP_TYPE be used somehow for the "fprintf()"s?

It could, but I suggest that care is taken not to put neatness before
accuracy.

Having said that, GRASS seems to use "%f" liberally, which tends to
suck for very large or very small values; you get roughly constant
absolute error rather than roughly constant relative error.

For an illustration, redo your tests with numbers a million times
smaller; you'll end up with only two significant digits. OTOH, if you
redo them with numbers a thousand times larger, you'll get the error
digits back.

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Hi Markus

I know I'm jumping in late but I have a couple of side
questions/comments.

Markus Neteler wrote:

r.mapcalc x1=1.4

<snip>

Starting to round:
r.mapcalc x1round="round(x1)"
EXECUTING x1round = ... 100%
CREATING SUPPORT FILES FOR x1round
range: 1 1
-> rounding is o.k.

Huh? Why is rounding 1.4 = 1 and not 0? This is a ceiling function, not
rounding. I find this more confusing than the slightly off floating
point numbers. I think this needs to be renamed or redefined.

Helena wrote:

also a left-over from the debugging when the FP support was developed
and the need for double precision versus single prec. was discussed

Why do we have both FCELL and DCELL? Why can't we just use DCELL? Just
wondering.

--
Sincerely,

Jazzman (a.k.a. Justin Hickey) e-mail: jhickey@hpcc.nectec.or.th
High Performance Computing Center
National Electronics and Computer Technology Center (NECTEC)
Bangkok, Thailand

People who think they know everything are very irritating to those
of us who do. ---Anonymous

Jazz and Trek Rule!!!

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Justin Hickey wrote:

> r.mapcalc x1=1.4
>
> <snip>
>
> Starting to round:
> r.mapcalc x1round="round(x1)"
> EXECUTING x1round = ... 100%
> CREATING SUPPORT FILES FOR x1round
> range: 1 1
> -> rounding is o.k.

Huh? Why is rounding 1.4 = 1 and not 0?

Huh? Did you misread "1.4" as "0.4"? Both floor(1.4) and round(1.4)
are 1, whilst ceil(1.4) is 2.

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

On Wed, May 16, 2001 at 10:47:17PM +0100, Glynn Clements wrote:

Markus Neteler wrote:

> > Unless someone is planning on implementing a more robust variance
> > algorithm, we should at least add a check for negative variance in
> > r.univar. BTW, does s.univar suffer from the same problem?
>
> I have tested this:

[snip]

> standard deviation 1.45885e-13
> coefficient of variation 1.32622e-11

[snip]

> -> far from being perfect

At least it didn't fail with a domain error, although that may well
just be luck. Not having looked at the code, I can imagine that using
a different input value might result in a variance slightly less than
zero instead of one slightly greater.

> > > r.mapcalc seems to generate FCELL only. Shouldn't the range printed with
> > > float precision then?
> >
> > Yes. float has just under 7 digits of precision (FLT_EPSILON is around
> > 1.19e-7 for IEEE single precision), so "%.6f" should do it. I suspect
> > that "%.10f" was "pulled out of a hat".
>
> O.k. I have changed that to "%.6f" in CVS.
> Now I get:
>
> r.mapcalc plane=1.1
> range: 1.1 1.1
>
> which is looking better.
>
> and:
> r.mapcalc plane=1.123456789
> range: 1.123457 1.123457

Note, however, that a similar approach probably should NOT be adopted
for writing data to files (e.g. sites files). Whilst using less
precision than a float has may improve the appearance (and conserve
space), it is likely to introduce further error.

Furthermore, "%f" (with or without width and/or precision modifiers)
should never be used for data which is intended to be read by a
program; use "%e" instead.

> > > Sorry for insisting here, but above range will
> > > be expected to be wrong by the average user not familiar with precision
> > > issues.
> > >
> > > And r.stats: maybe the printed precision should be dependent from the
> > > input map type, means more decimals for DCELL maps that FCELL maps.
> >
> > Maybe; although this won't help if a DCELL map is generated using
> > float data somewhere in the process.
>
> O.k., but we should address the problem when displaying a FCELL map.
> In above example I get:
>
> r.stats -1 plane
> 1.1234568357
> 1.1234568357
> 1.1234568357
>
> instead of
> 1.1234568
> 1.1234568
> 1.1234568
>
> Could the RASTER_MAP_TYPE be used somehow for the "fprintf()"s?

It could, but I suggest that care is taken not to put neatness before
accuracy.

Having said that, GRASS seems to use "%f" liberally, which tends to
suck for very large or very small values; you get roughly constant
absolute error rather than roughly constant relative error.

For an illustration, redo your tests with numbers a million times
smaller; you'll end up with only two significant digits. OTOH, if you
redo them with numbers a thousand times larger, you'll get the error
digits back.

Glynn,

I agree with you that accuracy is to be considered, not neatness.
Therefore I am careful to change anything and continue to discuss/learn
here :slight_smile:

How shall we deal with the example:
r.mapcalc plane=1.123456789
r.stats -1 plane
1.1234568357
1.1234568357
1.1234568357
          ^^^
          These numbers are phantasy.
As a lot of users (like me) like to redirect the output of r.stats into
awk and other tools, they run into troubles with precision.

I have implemented "-s" right now into r.stats (not uploaded yet) which
offers optional scientific notation (%e). In our example we would get:
1.123457e+

Does such a flag make sense?
But: What about r.to.sites which is using %f and sending the wrong numbers
into the sites list? Here we need a change, too.

Regards

Markus

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Markus Neteler wrote:

> > Could the RASTER_MAP_TYPE be used somehow for the "fprintf()"s?
>
> It could, but I suggest that care is taken not to put neatness before
> accuracy.
>
> Having said that, GRASS seems to use "%f" liberally, which tends to
> suck for very large or very small values; you get roughly constant
> absolute error rather than roughly constant relative error.
>
> For an illustration, redo your tests with numbers a million times
> smaller; you'll end up with only two significant digits. OTOH, if you
> redo them with numbers a thousand times larger, you'll get the error
> digits back.

I agree with you that accuracy is to be considered, not neatness.
Therefore I am careful to change anything and continue to discuss/learn
here :slight_smile:

How shall we deal with the example:
r.mapcalc plane=1.123456789
r.stats -1 plane
1.1234568357
1.1234568357
1.1234568357
          ^^^
          These numbers are phantasy.

Use "%.7e" or "%.8g" (or "%#.8g"). These (along with "%.38f", which
you probably won't want) are the smallest precision values which won't
introduce further error into a float. For double, the equivalents are
"%.16e", "%.17g" and "%.308f" (which you definitely won't want).

As a lot of users (like me) like to redirect the output of r.stats into
awk and other tools, they run into troubles with precision.

I have implemented "-s" right now into r.stats (not uploaded yet) which
offers optional scientific notation (%e). In our example we would get:
1.123457e+

Does such a flag make sense?

The inverse of this flag might make sense; "%e" should be the default
(if you really don't like that, at least use "%g").

But: What about r.to.sites which is using %f and sending the wrong numbers
into the sites list? Here we need a change, too.

If you use floating-point, you can't help but output "wrong" numbers.

In a few specific cases (including those from your original email),
the error introduced by prematurely trunctating the decimal
representation just happens to exactly cancel the error introduced by
converting to floating point. However, truncation will increase error
far more often than it reduces it.

How about this example:

r.mapcalc plane=0.00000001123456789
r.stats -1 plane
0.0000000112
0.0000000112
0.0000000112
0.0000000112
0.0000000112
0.0000000112
0.0000000112
0.0000000112
0.0000000112
0.0000000112
            ^^^
            These numbers weren't fantasy but, due to "%f", they are now.

(NB: r.mapcalc doesn't appear to support "E" notation for constants;
this could be considered a bug).

The above is with "%.10f"; with "%f", the output would be zero, which
is a rather substantial error.

Shouldn't we be more concerned about how GRASS handles real-world data
than contrived test cases?

If GRASS only needs to support "toy" numbers, there doesn't seem to be
much point in having floating-point support at all.

The bottom line is that "%f" isn't much use unless you know the range
of the data, as the precision specifies an absolute error. An error of
1.2e-7 doesn't seem like much if you're assuming that the values will
be within a few orders of magnitude of 1.0; but is this assumption
valid?

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

[CC'd to grass5, as I really think that this merits some discussion.]

Markus Neteler wrote:

as you really know much more than me about the precision issue and the
related %f (and whatever), may I leave eventual changes to you? I am
afraid to introduce more problems that currently there. My intention is
to have relyable numbers, not optically polished numbers, no question!

The candidates seem to be
r.stats
r.to.sites
r.describe
r.report

which print numbers to stdout.

I started looking into r.stats, and noticed a couple of issues:

1. r.stats uses DCELL for all floating-point data. Theoretically this
needs 17 significant digits in order to preserve accuracy, but would
the data ever be this accurate?

For geographical data, I suspect not; but how accurate might it be? If
we pick some "reasonable" value, that then becomes the limit of its
accuracy. If we don't, 17 digits is likely to be too many in 99% of
cases (9 digits too many for all FCELL data).

I noticed some code which appeared to be intended to support
specifying the precision ("dp" variable in main.c, "fmt" argument to
cell_stats, print_cell_stats). However, this isn't actually used (in
that "dp" can't actually be changed).

2. Shouldn't the format of "r.stats -1" be consistent for all data?

This initially started as a question of whether to strip trailing
zeroes, but then there's the issue that, with "%g", some values could
be printed in exponential form and others not.

One solution would be to select either "%e" or "%f" (and an
appropriate precision) once, based upon the overall range of the data,
rather than letting "%g" choose for each individual value.

3. Might the output from "r.stats -1" be fed to programs which don't
recognise exponential form? The ANSI functions (atof, strtod, *scanf)
all support it, but not everything uses those.

I haven't looked at the other programs; I suspect that similar issues
may apply to those.

If the output from the above programs (apart from r.to.sites) is
intended for a user (instead of or as well as for programs), then
appearance is a valid consideration. Further, programs may impose
limitations on their input (e.g. a program which reads floating-point
values may first store the string in a buffer which isn't wide enough
for the full precision of a double).

It might be worth the effort of designing and implementing (the latter
is the easy part) a system-wide function for converting floating-point
numbers to decimal.

At it's simplest, this could be little more than:

  sprintf(buf, getenv("GRASS_FLOAT_FMT"), val);

(don't take this too literally).

A better approach would allow individual programs to specify
particular requirements, with the default format filling in the
blanks.

An additional possibility is a new option type (in the sense of
G_define_option) to allow consistent input of format specifications as
command-line parameters.

--
Glynn Clements <glynn.clements@virgin.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Hi Glynn

Glynn Clements wrote:

Justin Hickey wrote:

> > r.mapcalc x1=1.4
> >
> > <snip>
> >
> > Starting to round:
> > r.mapcalc x1round="round(x1)"
> > EXECUTING x1round = ... 100%
> > CREATING SUPPORT FILES FOR x1round
> > range: 1 1
> > -> rounding is o.k.
>
> Huh? Why is rounding 1.4 = 1 and not 0?

Huh? Did you misread "1.4" as "0.4"? Both floor(1.4) and round(1.4)
are 1, whilst ceil(1.4) is 2.

Duhh! I must have replied to that just before I dozed off! :slight_smile: Stupidity
reigns supreme for me sometimes.

--
Sincerely,

Jazzman (a.k.a. Justin Hickey) e-mail: jhickey@hpcc.nectec.or.th
High Performance Computing Center
National Electronics and Computer Technology Center (NECTEC)
Bangkok, Thailand

People who think they know everything are very irritating to those
of us who do. ---Anonymous

Jazz and Trek Rule!!!

[NB: this thread was last active in mid May 2001]

Glynn Clements wrote:

> > Does this go away if you compile r.univar with "-ffloat-store"?
>
> Well, r.univar is only a script using r.stats and awk. However, after
> recompiling r.stats with above flag (adding
> EXTRA_CFLAGS=-ffloat-store
> to the r.stats' Gmakefile) I can't see any difference.

That wouldn't help; it's the awk script which is doing the relevant
calculations. Looking at the awk script, I don't think that compiling
awk with -ffloat-store would help.

The problem is basically that none of the axioms of real arithmetic
quite hold true for floating-point arithmetic.

In this particular case, the variance is:

  (SUM[x(i)^2] - SUM[x(i)]^2/N)/N

If all of the x(i)'s are identical, then the two sides of the
subtraction should be equal. However, given the error introduced by
each floating point operation, they will differ slightly; if the RHS
is slightly larger, then result will be negative, causing the sqrt()
to fail.

Whilst there probably is an algorithmic solution (a decent textbook on
numerical methods should list many approaches for working around the
problems inherent in floating-point subtraction), I'm not sure whether
that would be justified, or if it would be better to just include an
explicit check for a negative variance (indicating a variance so low
that it's been swamped by rounding error).

FWIW, this subject has just come up on the GMT list, where someone
posted the following URL:

http://web.mit.edu/10.001/Web/Course_Notes/Statistics_Notes/Visualization/node4.html

It describes an algorithm to compute the variance/std.dev. in a single
pass, without the risk of a negative variance due to rounding error.

--
Glynn Clements <glynn.clements@virgin.net>