[GRASS-dev] using rand(x,y) in r.mapcalc (grass7)

When I run several times e.g., r.mapcalc “a = rand(0,100)”

I am always getting exactly the same layer. In the help file it reads:

“The environment variable GRASS_RND_SEED is read to initialize the random number generator”

But what does it mean. Shouldn’t the seed not be generated on e.g, OS time, which would ensure that each run would give a different result?

On a related note, it would be nice to be able to set the seed (I think there has been such a request before, but not sure about the answer at that time).

Paulo van Breugel wrote:

When I run several times e.g., r.mapcalc "a = rand(0,100)"

I am always getting exactly the same layer. In the help file it reads:

"The environment variable GRASS_RND_SEED is read to initialize the random
number generator"

But what does it mean.

The value of that environment variable is parsed using atol() and the
result used to seed the PRNG (via srand() or srand48()) (setup_rand()
in r.mapcalc/evaluate.c).

If the variable isn't set, the PRNG isn't explicitly seeded. For
rand(), the result should be equivalent to GRASS_RND_SEED=1.

Shouldn't the seed not be generated on e.g, OS time,
which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

  GRASS_RND_SEED=`date +%N` r.mapcalc "a = rand(0,100)"

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

On a related note, it would be nice to be able to set the seed (I think
there has been such a request before, but not sure about the answer at that
time).

GRASS_RND_SEED was the answer.

--
Glynn Clements <glynn@gclements.plus.com>

On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements <glynn@gclements.plus.com>
wrote:

> Shouldn't the seed not be generated on e.g, OS time,
> which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

Does the reproducibility go behind one operating system, compiler or

library? I don't think that the first random number is specified by the C
language standard. If the results would be really reproducible it would be
good for testing framework but I'm afraid that they are not (with my
limited knowledge about the topic).

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

        GRASS_RND_SEED=`date +%N` r.mapcalc "a = rand(0,100)"

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

I've heard that this is not enough on powerful computers/clusters, that you
have to use also PID because nanoseconds might be the same (I think I
rememberer that it was nanoseconds not seconds).

> On a related note, it would be nice to be able to set the seed (I think
> there has been such a request before, but not sure about the answer at
that
> time).

GRASS_RND_SEED was the answer.

I think there should be some possibility of randomization (auto-setting of
seed) build-in the modules providing random(ized) results. Perhaps a flag
which would turn it on. It can be also an option which would behave like
GRASS_RND_SEED but would have one special value for auto-generating the
seed. (GRASS_RND_SEED if present would override this option.) With the
default value of the option we should ask a question what is actually the
expected behavior of the module giving random results.

This would provide a nicer interface in Python, standard interface in
command line, and possibility to set it in the GUI (which means possibility
to set it for users which don't use command line.) Moreover, it would
provide all users with the way of setting the random seen in the manner
which we consider the best according to our knowledge.

Vaclav

It is certainly be good to be able to reproduce commands. However, I think in most (statistical) software the default / expected behaviour is to have a new automatically generated seed at each run. In R for example, if you have to explicitly specify the seed using the function set.seed(). I would think therefore what most users will expect a similar behaviour in GRASS. It would certainly be my personal preference to have the option to set the seed explicitly if you want reproducibility, but have it generated automatically otherwise. But that is just a personal preference. Perhaps this can be explained like this in the manual page? A far better option would be to provide this as a normal parameter so it can be set from the gui interface or command line like any other variable. Yes, that would be great. As for the default value, see my earlier argument. Agree. The way to set the seed now may not be understood by everybody and with all the work going into streamlining the GUI, this kind of fairly important options should also be available through the GUI

···

On 03-07-14 03:43, Vaclav Petras wrote:

On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

Shouldn’t the seed not be generated on e.g, OS time,
which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

Does the reproducibility go behind one operating system, compiler or library? I don’t think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I’m afraid that they are not (with my limited knowledge about the topic).

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

GRASS_RND_SEED=date +%N r.mapcalc “a = rand(0,100)”

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

I’ve heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds).

On a related note, it would be nice to be able to set the seed (I think
there has been such a request before, but not sure about the answer at that
time).

GRASS_RND_SEED was the answer.

I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results.

This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don’t use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge.

Vaclav

Just a quick additional question, how to set this GRASS_RND_SEED from within a python script (I want to add the option to set the seed with a seed parameter in my script, as suggested in the previous email).

···

On Thu, Jul 3, 2014 at 8:55 AM, Paulo van Breugel <p.vanbreugel@gmail.com> wrote:

On 03-07-14 03:43, Vaclav Petras wrote:

It is certainly be good to be able to reproduce commands. However, I think in most (statistical) software the default / expected behaviour is to have a new automatically generated seed at each run. In R for example, if you have to explicitly specify the seed using the function set.seed(). I would think therefore what most users will expect a similar behaviour in GRASS. It would certainly be my personal preference to have the option to set the seed explicitly if you want reproducibility, but have it generated automatically otherwise. But that is just a personal preference.

Perhaps this can be explained like this in the manual page? A far better option would be to provide this as a normal parameter so it can be set from the gui interface or command line like any other variable.

Yes, that would be great. As for the default value, see my earlier argument.

Agree. The way to set the seed now may not be understood by everybody and with all the work going into streamlining the GUI, this kind of fairly important options should also be available through the GUI

On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements <glynn@gclements.plus.com> wrote:

Shouldn’t the seed not be generated on e.g, OS time,
which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

Does the reproducibility go behind one operating system, compiler or library? I don’t think that the first random number is specified by the C language standard. If the results would be really reproducible it would be good for testing framework but I’m afraid that they are not (with my limited knowledge about the topic).

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

GRASS_RND_SEED=date +%N r.mapcalc “a = rand(0,100)”

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

I’ve heard that this is not enough on powerful computers/clusters, that you have to use also PID because nanoseconds might be the same (I think I rememberer that it was nanoseconds not seconds).

On a related note, it would be nice to be able to set the seed (I think
there has been such a request before, but not sure about the answer at that
time).

GRASS_RND_SEED was the answer.

I think there should be some possibility of randomization (auto-setting of seed) build-in the modules providing random(ized) results. Perhaps a flag which would turn it on. It can be also an option which would behave like GRASS_RND_SEED but would have one special value for auto-generating the seed. (GRASS_RND_SEED if present would override this option.) With the default value of the option we should ask a question what is actually the expected behavior of the module giving random results.

This would provide a nicer interface in Python, standard interface in command line, and possibility to set it in the GUI (which means possibility to set it for users which don’t use command line.) Moreover, it would provide all users with the way of setting the random seen in the manner which we consider the best according to our knowledge.

Vaclav

On Thu, Jul 3, 2014 at 9:39 AM, Paulo van Breugel <p.vanbreugel@gmail.com>
wrote:

Just a quick additional question, how to set this GRASS_RND_SEED from
within a python script (I want to add the option to set the seed with a
seed parameter in my script, as suggested in the previous email).

Concerning the question above, I found out how to do so. I used it in my
r.random.weight script (in grass 7 addons svn). This script uses the rand()
function in r.mapcalc. But rather than using the same seed (1), there is
the option to set the seed, while as default the a time-dependent seed is
set. I am sure there are better ways to do this, but it works.

On Thu, Jul 3, 2014 at 8:55 AM, Paulo van Breugel <p.vanbreugel@gmail.com>
wrote:

On 03-07-14 03:43, Vaclav Petras wrote:

On Wed, Jul 2, 2014 at 8:15 PM, Glynn Clements <glynn@gclements.plus.com>
wrote:

> Shouldn't the seed not be generated on e.g, OS time,
> which would ensure that each run would give a different result?

No. The reason is to provide reproducibility. Anyone running the same
command with the same data should obtain the same result.

  It is certainly be good to be able to reproduce commands. However, I
think in most (statistical) software the default / expected behaviour is to
have a new automatically generated seed at each run. In R for example, if
you have to explicitly specify the seed using the function set.seed(). I
would think therefore what most users will expect a similar behaviour in
GRASS. It would certainly be my personal preference to have the option to
set the seed explicitly if you want reproducibility, but have it generated
automatically otherwise. But that is just a personal preference.

Does the reproducibility go behind one operating system, compiler or

library? I don't think that the first random number is specified by the C
language standard. If the results would be really reproducible it would be
good for testing framework but I'm afraid that they are not (with my
limited knowledge about the topic).

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

        GRASS_RND_SEED=`date +%N` r.mapcalc "a = rand(0,100)"

[%N is the nanoseconds portion of the current time; this is a GNU
extension.]

  Perhaps this can be explained like this in the manual page? A far
better option would be to provide this as a normal parameter so it can be
set from the gui interface or command line like any other variable.

I've heard that this is not enough on powerful computers/clusters, that
you have to use also PID because nanoseconds might be the same (I think I
rememberer that it was nanoseconds not seconds).

> On a related note, it would be nice to be able to set the seed (I think
> there has been such a request before, but not sure about the answer at
that
> time).

GRASS_RND_SEED was the answer.

I think there should be some possibility of randomization (auto-setting
of seed) build-in the modules providing random(ized) results. Perhaps a
flag which would turn it on. It can be also an option which would behave
like GRASS_RND_SEED but would have one special value for auto-generating
the seed. (GRASS_RND_SEED if present would override this option.) With the
default value of the option we should ask a question what is actually the
expected behavior of the module giving random results.

Yes, that would be great. As for the default value, see my earlier
argument.

This would provide a nicer interface in Python, standard interface in
command line, and possibility to set it in the GUI (which means possibility
to set it for users which don't use command line.) Moreover, it would
provide all users with the way of setting the random seen in the manner
which we consider the best according to our knowledge.

Agree. The way to set the seed now may not be understood by everybody and
with all the work going into streamlining the GUI, this kind of fairly
important options should also be available through the GUI

Vaclav

Vaclav Petras wrote:

> > Shouldn't the seed not be generated on e.g, OS time,
> > which would ensure that each run would give a different result?
>
> No. The reason is to provide reproducibility. Anyone running the same
> command with the same data should obtain the same result.

Does the reproducibility go behind one operating system, compiler or
library?

If drand48() is used, yes. If rand() is used, no.

I don't think that the first random number is specified by the C
language standard.

The C standard doesn't specify any particular implementation for
rand() (it does give an example implementation, but it only produces
15-bit values). It does specify that if the PRNG isn't explicitly
seeded, the behaviour is as if srand(1) was called beforehand.
[§7.20.2.2p2]

IOW, the sequence of results is implementation-dependent, but it may
not change from one run to the next unless the program explicitly
seeds the PRNG with a non-deterministic value such as the current
time.

If the results would be really reproducible it would be
good for testing framework but I'm afraid that they are not (with my
limited knowledge about the topic).

In ticket #2272, I attached a portable implementation of lrand48(). If
desired, we could add this to libgis and use that in preference to any
implementation-specific PRNG.

> If you want a different result each time, set GRASS_RND_SEED to a
> different value each time, e.g.
>
> GRASS_RND_SEED=`date +%N` r.mapcalc "a = rand(0,100)"
>
> [%N is the nanoseconds portion of the current time; this is a GNU
> extension.]

I've heard that this is not enough on powerful computers/clusters, that you
have to use also PID because nanoseconds might be the same (I think I
rememberer that it was nanoseconds not seconds).

The main issue is on systems where the reported time only changes in
increments of a scheduler "tick" (e.g. 10ms on old versions of Linux).

> > On a related note, it would be nice to be able to set the seed (I think
> > there has been such a request before, but not sure about the answer at
> that
> > time).
>
> GRASS_RND_SEED was the answer.

I think there should be some possibility of randomization (auto-setting of
seed) build-in the modules providing random(ized) results. Perhaps a flag
which would turn it on. It can be also an option which would behave like
GRASS_RND_SEED but would have one special value for auto-generating the
seed. (GRASS_RND_SEED if present would override this option.) With the
default value of the option we should ask a question what is actually the
expected behavior of the module giving random results.

That's certainly reasonable. The main thing is that I believe that
reproducibility should be the default. If people have to take explicit
action to introduce randomness, they're more likely to consider the
issues involved. If randomised seeds are the default, the lack of
reproducibility may not be considered until it is too late.

--
Glynn Clements <glynn@gclements.plus.com>

Paulo van Breugel wrote:

Just a quick additional question, how to set this GRASS_RND_SEED from
within a python script (I want to add the option to set the seed with a
seed parameter in my script, as suggested in the previous email).

You can modify os.environ prior to calling it, e.g.

  import time
  import grass.script as grass
  ...
  t = int(time.time() * 1e9) % (2**31)
  os.environ['GRASS_RND_SEED'] = '%d' % t
  grass.mapcalc(...)

--
Glynn Clements <glynn@gclements.plus.com>

On Sun, Jul 6, 2014 at 12:34 AM, Glynn Clements <glynn@gclements.plus.com>
wrote:

Paulo van Breugel wrote:

> Just a quick additional question, how to set this GRASS_RND_SEED from
> within a python script (I want to add the option to set the seed with a
> seed parameter in my script, as suggested in the previous email).

You can modify os.environ prior to calling it, e.g.

        import time
        import grass.script as grass
        ...
        t = int(time.time() * 1e9) % (2**31)
        os.environ['GRASS_RND_SEED'] = '%d' % t
        grass.mapcalc(...)

Hi, thanks.. I found out the solution after a bit of diving into the
documentation. I btw still think the default should be to have a random
seed as I think that is what most people would expect (I did, but after
running a function for a day and night, I found out I was wrong). But
anyway, it ultimately comes down to preference, so most important I think
is if the user has a clear choice available in both the gui and on the
command line. If that could be implemented, either way, that would be great.

--
Glynn Clements <glynn@gclements.plus.com>

On Sun, Jul 6, 2014 at 12:25 AM, Glynn Clements
<glynn@gclements.plus.com> wrote:

Glynn Clements <glynn@gclements.plus.com> wrote:

...

In ticket #2272, I attached a portable implementation of lrand48(). If
desired, we could add this to libgis and use that in preference to any
implementation-specific PRNG.

This would be excellent.

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

IMHO this is not intuitive at all. I would suggest to invert the
behaviour for GRASS 7:
- per default generate random numbers which differ,
- if the user needs reproducability, then have a env var to enable that.

The main thing is that I believe that
reproducibility should be the default.

I humbly disagree. This is not what the user expects. It is also the
opposite of how for example R behaves:

R

runif(1)

[1] 0.5624295

runif(1)

[1] 0.1683853

http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed
" If you want to perform an exact replication of your program, you
have to specify the seed using the function set.seed()."

If people have to take explicit
action to introduce randomness,

The problem is that most will not even realize the current behaviour of rand().

they're more likely to consider the
issues involved. If randomised seeds are the default, the lack of
reproducibility may not be considered until it is too late.

The R community (and some users here) think the opposite... when you
ask for rand() then you expect a random number. Just to avoid this:
https://xkcd.com/221/

Markus

And not only the R community I am sure. In all statistical packages I have ever worked with one can see the same behaviour, a random number is random (i.e., each time a different seed), unless the seed is explicitly defined by the user. And it seems to be the default behaviour by python/numpy: >>> import numpy as np >>> np.random.random() 0.8351426142559701 >>> np.random.random() 0.4813823441998394 >>> np.random.random() 0.7279314267025369

···

On 21-07-14 19:01, Markus Neteler wrote:

On Sun, Jul 6, 2014 at 12:25 AM, Glynn Clements
[<glynn@gclements.plus.com>](mailto:glynn@gclements.plus.com) wrote:

Glynn Clements [<glynn@gclements.plus.com>](mailto:glynn@gclements.plus.com) wrote:

...

In ticket #2272, I attached a portable implementation of lrand48(). If
desired, we could add this to libgis and use that in preference to any
implementation-specific PRNG.

This would be excellent.

If you want a different result each time, set GRASS_RND_SEED to a
different value each time, e.g.

IMHO this is not intuitive at all. I would suggest to invert the
behaviour for GRASS 7:
- per default generate random numbers which differ,
- if the user needs reproducability, then have a env var to enable that.

The main thing is that I believe that
reproducibility should be the default.

I humbly disagree. This is not what the user expects. It is also the
opposite of how for example R behaves:

R

runif(1)

[1] 0.5624295

runif(1)

[1] 0.1683853

[http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed](http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation#Seed)
" If you want to perform an exact replication of your program, you
have to specify the seed using the function set.seed()."

If people have to take explicit
action to introduce randomness,

The problem is that most will not even realize the current behaviour of rand().

they're more likely to consider the
issues involved. If randomised seeds are the default, the lack of
reproducibility may not be considered until it is too late.

The R community (and some users here) think the opposite... when you
ask for rand() then you expect a random number. 
Just to avoid this:
[https://xkcd.com/221/](https://xkcd.com/221/)

Markus

Paulo van Breugel wrote:

And it seems to be the default behaviour by python/numpy:

It is, but ...

>>> import numpy as np
>>> np.random.random()
0.8351426142559701
>>> np.random.random()
0.4813823441998394
>>> np.random.random()
0.7279314267025369

... this example doesn't demonstrate that. Any PRNG returns different
values for successive calls.

The question is whether the PRNG's initial value should autmatically
be seeded from some external source of entropy (e.g. the system
clock), so that the sequence of values differs on different runs.

In turn, that brings up questions about the quality of the entropy
source. The ANSI C time() function typically only has one second
granularity (indeed, POSIX requires this, as time_t is defined as
seconds since the epoch), which is sufficiently course that successive
runs may get the same seed. Other functions aren't portable, and even
where available, the granularity isn't guaranteed.

My main objection to automatic seeding is that people will inevitably
produce non-repeatable results without even realising it.

One possible solution would be to automatically add the seed to the
history of any map generated by r.mapcalc (or possibly only those
which use the rand() function). But that would still only help if the
creator either provides access to the generated maps, or the output
from r.info. Simply providing the commands used and the end result
wouldn't help.

--
Glynn Clements <glynn@gclements.plus.com>

Markus Neteler wrote:

- if the user needs reproducability, then have a env var to enable that.

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.

--
Glynn Clements <glynn@gclements.plus.com>

On Tue, Jul 22, 2014 at 4:39 PM, Glynn Clements <glynn@gclements.plus.com>
wrote:

Paulo van Breugel wrote:

> And it seems to be the default behaviour by python/numpy:

It is, but ...

> >>> import numpy as np
> >>> np.random.random()
> 0.8351426142559701
> >>> np.random.random()
> 0.4813823441998394
> >>> np.random.random()
> 0.7279314267025369

... this example doesn't demonstrate that.

Good point, on my computer I get:

import numpy as np
np.random.random()

0.49727844715398417

And in different (also freshly started) Python:

import numpy as np
np.random.random()

0.2457281014919791

Any PRNG returns different

values for successive calls.

The problem is that user may not see the difference between between two

module calls in GRASS command line and two calls of random() function in
Python. When calling GRASS module in Python the difference is even less
visible.

Anyway, the reproducibility would be really nice considering GRASS
scientific audience, however are you sure that different systems will give
same random number for the same seed? Or do you think about reproducible as
"as reproducible as possible, e.g. using the same system if necessary".

The question is whether the PRNG's initial value should autmatically

be seeded from some external source of entropy (e.g. the system
clock), so that the sequence of values differs on different runs.

In turn, that brings up questions about the quality of the entropy
source. The ANSI C time() function typically only has one second
granularity (indeed, POSIX requires this, as time_t is defined as
seconds since the epoch), which is sufficiently course that successive
runs may get the same seed. Other functions aren't portable, and even
where available, the granularity isn't guaranteed.

What about time + process id?

My main objection to automatic seeding is that people will inevitably
produce non-repeatable results without even realising it.

One possible solution would be to automatically add the seed to the
history of any map generated by r.mapcalc (or possibly only those
which use the rand() function). But that would still only help if the
creator either provides access to the generated maps, or the output
from r.info. Simply providing the commands used and the end result
wouldn't help.

--
Glynn Clements <glynn@gclements.plus.com>
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On Tue, Jul 22, 2014 at 4:58 PM, Glynn Clements <glynn@gclements.plus.com>
wrote:

Markus Neteler wrote:

> - if the user needs reproducability, then have a env var to enable that.

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.

This looks very good at first glance.

I guess there is a lot to say for both approaches, which is why I think the suggestion of Markus is a very good one! +1 from me

···

On 22-07-14 22:58, Glynn Clements wrote:

Markus Neteler wrote:

- if the user needs reproducability, then have a env var to enable that.

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.

On Tue, Jul 22, 2014 at 11:19 PM, Paulo van Breugel wrote:

On 22-07-14 22:58, Glynn Clements wrote:

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.

I guess there is a lot to say for both approaches, which is why I think the
suggestion of Markus is a very good one! +1 from me

It is indeed Glynn's suggestion (which I like, too).

Markus

Sorry, never seem to get used to how my email program displays the treads… good suggestion by Glynn I mean… Glynn, it would be really great if you could implement it that way.

···

On 22-07-14 23:31, Markus Neteler wrote:

On Tue, Jul 22, 2014 at 11:19 PM, Paulo van Breugel wrote:

On 22-07-14 22:58, Glynn Clements wrote:

And when issue of usability doesn't even get considered until a few
years later when the user (or a colleague) gets an email suggesting
the results can't be be reproduced ...?

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

That way, neither of the likely problems can arise by oversight.

I guess there is a lot to say for both approaches, which is why I think the
suggestion of Markus is a very good one!  +1 from me

It is indeed Glynn's suggestion (which I like, too).
Markus

Vaclav Petras wrote:

Anyway, the reproducibility would be really nice considering GRASS
scientific audience, however are you sure that different systems will give
same random number for the same seed?

They will from now on, because I've replaced the use of the system's
PRNG (either rand or mrand48/drand48) with a portable implementation
of the latter.

> What about time + process id?

That's what's done now (if -s is used). Although we could probably do
with a better hash (currently, it's just addition) and/or more entropy
sources.

--
Glynn Clements <glynn@gclements.plus.com>

Glynn Clements wrote:

I'm inclined to add both an option (to specify a seed, replacing the
environment variable) and a flag (to seed from the system clock or
whatever), and having the PRNG generate a fatal error if neither of
those are used.

This is now done.

r61350 adds the lrand48/mrand48/drand48 equivalents to lib/gis. Brief
testing suggests that the results are identical to those generated by
GNU libc (which should be identical to any other POSIX implementation).

r61352 changes it to generate a fatal error if used prior to seeding.

r61353 changes r.mapcalc so that seeding is performed via seed= or -s.
The seed (whether specified by seed= or generated for -s) is added to
the history (for r.mapcalc; r3.mapcalc's create_history() function is
a stub; do 3D rasters have history?)

Note that GRASS_RND_SEED is no longer supported. That was a hack from
the time before r.mapcalc used G_parser().

As I write this, it has occurred to me that the behaviour of rand()
may be non-deterministic in the presence of certain forms of
parallelism, e.g. multiple occurences of rand() in the expression(s)
in conjunction with pthreads. Ultimately we may need to expand the
PRNG to support explicit state (as per erand48, nrand48 and jrand48).

--
Glynn Clements <glynn@gclements.plus.com>