[GRASS-dev] [bug #4998] (grass) r.mapcalc rand (a, b): b-1 is used instead of b

this bug's URL: http://intevation.de/rt/webrt?serial_num=4998
-------------------------------------------------------------------------

Subject: r.mapcalc rand (a,b): b-1 is used instead of b

Platform: GNU/Linux/x86
grass obtained from: CVS
grass binary for platform: Compiled from Sources
GRASS Version: 2006-08-12

The rand (a,b) function in r.mapcalc uses the (b-1) instead of b:

$ r.mapcalc 'map=rand(1,100)'
$ r.info -r map
min=1
max=99
    ^
    |
  should be 100!

And if the min and max are equal, r.mapcals goes into some infinite loop:

$ r.mapcalc 'map=rand(1,1)'
   0%
   ^
   |
  no progress whatever, 100% CPU used

Maciek

-------------------------------------------- Managed by Request Tracker

Request Tracker wrote:

this bug's URL: http://intevation.de/rt/webrt?serial_num=4998
-------------------------------------------------------------------------

Subject: r.mapcalc rand (a,b): b-1 is used instead of b

Platform: GNU/Linux/x86
grass obtained from: CVS
grass binary for platform: Compiled from Sources
GRASS Version: 2006-08-12

The rand (a,b) function in r.mapcalc uses the (b-1) instead of b:

$ r.mapcalc 'map=rand(1,100)'
$ r.info -r map
min=1
max=99
    ^
    |
  should be 100!

The above is correct, although the documentation should be more
explicit. The result of rand() is a pseudo-random value x such that:

  a <= x < b

This definition results in consistent behaviour for both the integer
and floating-point cases, as well as being consistent with most
standard PRNG functions (which usually treat the upper bound as
exclusive).

And if the min and max are equal, r.mapcals goes into some infinite loop:

$ r.mapcalc 'map=rand(1,1)'
   0%
   ^
   |
  no progress whatever, 100% CPU used

This might a compiler error, although it should handle this
explicitly.

The above case will result in division by zero, typically causing
SIGFPE to be thrown. r.mapcalc installs a signal handler for SIGFPE
which sets a flag then returns. It appears that SIGFPE gets thrown
continuously in this case.

I've fixed this error, and updated the documentation to clarify the
behaviour.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements napisa?(a):

Request Tracker wrote:

$ r.mapcalc 'map=rand(1,100)'
$ r.info -r map
min=1
max=99
    ^
    |
  should be 100!

The above is correct, although the documentation should be more
explicit. The result of rand() is a pseudo-random value x such that:

  a <= x < b

This definition results in consistent behaviour for both the integer
and floating-point cases, as well as being consistent with most
standard PRNG functions (which usually treat the upper bound as
exclusive).

Is this really necessary? Isn't it counter intuitive?

$ r.mapcalc 'map=rand(1,1)'
   0%
   ^
   |
  no progress whatever, 100% CPU used

This might a compiler error, although it should handle this
explicitly.

The above case will result in division by zero, typically causing
SIGFPE to be thrown. r.mapcalc installs a signal handler for SIGFPE
which sets a flag then returns. It appears that SIGFPE gets thrown
continuously in this case.

I've fixed this error, and updated the documentation to clarify the
behaviour.

Thank you!

Maciek

Maciej Sieczka wrote:

>> $ r.mapcalc 'map=rand(1,100)'
>> $ r.info -r map
>> min=1
>> max=99
>> ^
>> |
>> should be 100!

> The above is correct, although the documentation should be more
> explicit. The result of rand() is a pseudo-random value x such that:

> a <= x < b

> This definition results in consistent behaviour for both the integer
> and floating-point cases, as well as being consistent with most
> standard PRNG functions (which usually treat the upper bound as
> exclusive).

Is this really necessary?

Well, "necessary" is debatable, but it is The Right Thing, for most
sane values of "right".

Isn't it counter intuitive?

That depends; whose intuition?

To anyone with a remotely mathematical background, anything other than
the above definition would be completely abnormal (at least for the
float case, and see below about treating integers differently).

It's a throughly established convention that partitions are defined as
start-closed, end-open (i.e. lo <= x < hi). E.g. if you partition the
reals into unit subsets, you would normally do it as:

{..., {x : -2 <= x < -1}, {x : -1 <= x < 0}, {x : 0 <= x < 1}, {x : 1 <= x < 2}, ...}

Using partitions which are closed at both ends results in overlapping
partitions (i.e. the boundary points belong to both), while partitions
which are open at both ends result in gaps (i.e. the boundary points
belong to neither).

With integers, you can always replace "... < n" with "... <= n+1";
that won't work with reals, as there isn't a "next" real.

Beyond that, as a general rule, the integers should, wherever possible
be treated as a subset of the reals, rather than as a disjoint set.
For anything which is defined for both integers and reals, the integer
version should be defined such that it produces the same (or
equivalent) result as applying the real definition to any real which
happens to be an integer.

Also, note that, in the floating-point case, the behaviour of
r.mapcalc's rand() is essentially dictated by that of drand48(). An
end-closed range would require writing our own FP PRNG.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements napisa?(a):

Using partitions which are closed at both ends results in overlapping
partitions (i.e. the boundary points belong to both), while partitions
which are open at both ends result in gaps (i.e. the boundary points
belong to neither).

Glynn,

Right. I haven't thought of it.

On the other hand, if one needs an FP random number from a closed range
eg. <1,100> (I mean - including the max value) given how rand is
currently implemented in r.mapcalc this cannot be achieved. He will get
all values <1,100), ie. all but 100 itself.

Thanks to your explanations I understand now this problem should be
rather considered as a wish. Would it be hard to implement both closed
and open ranges, as well as their combinations, in r.mapcalc's rand?

Maciek

Maciej Sieczka wrote:

> Using partitions which are closed at both ends results in overlapping
> partitions (i.e. the boundary points belong to both), while partitions
> which are open at both ends result in gaps (i.e. the boundary points
> belong to neither).

Right. I haven't thought of it.

On the other hand, if one needs an FP random number from a closed range
eg. <1,100> (I mean - including the max value) given how rand is
currently implemented in r.mapcalc this cannot be achieved. He will get
all values <1,100), ie. all but 100 itself.

Thanks to your explanations I understand now this problem should be
rather considered as a wish. Would it be hard to implement both closed
and open ranges, as well as their combinations, in r.mapcalc's rand?

For integers, getting a closed range is trivial; just add one to the
upper bound, i.e. rand(1,101) will give {x : 1 <= x <= 100}.

For floating point values, a closed range would require a FP PRNG
which provides a closed range, which we would have to write ourselves;
drand48() and erand48() both return values in the range [0,1), and
there aren't any other standard functions we could use.

You can fudge it with e.g. "rand(10000,1000001) / 1e4", although that
will produce artifacts which might be signficant for some
applications.

[We already do essentially this if drand48() isn't available, although
it currently results in an end-closed range (dividing by RAND_MAX
rather than RAND_MAX+1).]

Ultimately, for FP, the difference is largely theoretical. With a
53-bit mantissa, the chances of actually getting the end value would
be 1 in 2^53, which is negligible.

One of the things which makes writing a real FP PRNG tricky is that
every FP value within the target range should be possible, but the
probability of any given value occurring needs to be weighted
according to the exponent (i.e. if you want values between 1 and 4,
the number of values between 2 and 4 should be double the number
between 1 and 2).

FWIW, the current FP rand() implementation doesn't actually meet the
first criterion unless the bounds are 0 and 2^N, for some N. Again,
this can't be fixed without writing our own FP PRNG. It should be
close enough for most case though (and there will always be cases
where a given PRNG falls down due to the deterministic nature of
pseudo-random numbers; the only way to avoid that is to use truly
random numbers derived from e.g. quantum noise).

--
Glynn Clements <glynn@gclements.plus.com>