[GRASS-dev] [GRASS GIS] #2272: Improve consistency in random generator usage

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
The usage of drand48()/srand48() should be consolitated,
perhaps using G_math_rand() which is provided in
lib/gmath/rand1.c:

{{{
# incomplete search
find . -name '*.c' | xargs grep -l rand48
./raster/r.mapcalc/xrand.c
./raster/r.mapcalc/evaluate.c
./raster/r.random/creat_rand.c
./lib/gmath/rand1.c
./lib/vector/rtree/rect.c
./vector/v.qcount/findquads.c
./vector/v.random/main.c
./vector/v.kcv/main.c

grep rand48 */*/*.c | grep -v flag
lib/gmath/rand1.c: srand48(-seed);
lib/gmath/rand1.c: return (float)drand48();
raster/r.mapcalc/evaluate.c: srand48(seed_value);
raster/r.mapcalc/xrand.c:#define drand48()
((double)rand()/((double)RAND_MAX + 1))
raster/r.mapcalc/xrand.c:#define mrand48() ((long)rand())
raster/r.mapcalc/xrand.c: unsigned long x = (unsigned
long)mrand48();
raster/r.mapcalc/xrand.c: double x = drand48();
raster/r.mapcalc/xrand.c: double x = drand48();
raster/r.random/creat_rand.c:#define lrand48() (((long) rand() ^ ((long)
rand() << 16)) & 0x7FFFFFFF)
raster/r.random/creat_rand.c:#define srand48(sv) (srand((unsigned)(sv)))
raster/r.random/creat_rand.c: return lrand48();
raster/r.random/creat_rand.c: srand48((long)time((time_t *) 0));
vector/v.kcv/main.c: rng = drand48;
vector/v.kcv/main.c: srand48((long)getpid());
vector/v.qcount/findquads.c: #define R_INIT srand48
vector/v.qcount/findquads.c: #define RANDOM(lo,hi) (drand48()/R_MAX*(hi-
lo)+lo)
vector/v.random/main.c:double drand48()
vector/v.random/main.c:#define srand48(sv) (srand((unsigned)(sv)))
vector/v.random/main.c: struct Flag *rand, *drand48, *z, *notopo;
vector/v.random/main.c: rng = drand48;
vector/v.random/main.c: srand48((long)seed);
vector/v.random/main.c: srand48((long)getpid());
}}}

See also r53234 etc.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: normal | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [ticket:2272 neteler]:
> The usage of drand48()/srand48() should be consolitated,
> perhaps using G_math_rand() which is provided in
> lib/gmath/rand1.c:

First, G_math_rand() should be fixed. Currently, it returns a value
strictly less than 1.0 if drand48 is used, or a value less than or equal
to 1.0 if rand() is used.

Also, seeding should be a separate function.

And we should probably have a portable equivalent to lrand48(), i.e.
something which returns a non-negative 31-bit integer. The version in
r.random is broken, as bit 15 will always be zero if RAND_MAX is 32767
(the minimum value allowed by the ISO C standards).

My suggestion would be to re-implement the LCG used by lrand48() etc.
[attachment:lrand48.c Sample implementation]

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * priority: normal => blocker

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: blocker | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

For the record, from grass-dev:

On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements wrote:
> Most uses of rand, random, lrand48, etc have been replaced in r61415
> and r61416.

> Many of these modules seeded the RNG from either the clock or the PID,
> with no way to provide an explicit seed. In such cases, the use of
> G_srand48_auto() has been marked with a "FIXME" comment.

> It would be possible to modify G_srand48_auto() to allow the use of an
> environment variable, but this has problems of its own (e.g. setting
> the variable manually then forgetting about it).

> r.li.daemon/daemon.c uses a hard-coded seed of zero.

> r61416 changes readcell.c in r.proj, i.rectify, and i.ortho.rectify.
> These all used rand() to select a random cache block for ejection.

> While this wouldn't have affected the result, only the first RAND_MAX
> cache blocks would have been used. RAND_MAX is only guaranteed to be
> at least 32767, which would limit the effective cache size to 1 GiB
> (each block is 64 * 64 = 4096 "double"s = 32kiB).

> Cases which haven't been changed are:

> lib/raster3d/test/test_put_get_value_large_file.c. This appears to be
> a test case; does it matter?

> include/iostream/quicksort.h uses random() or rand() to select a pivot
> for the quicksort algorithm. That file has no dependency on lib/gis
> (or anything else, except for <stdlib.h> for random/rand), and I
> didn't want to add one unnecessarily.

> Again, this shouldn't affect the result, but there may be performance
> issues if the size of the array being sorted is significantly larger
> than RAND_MAX (in this situation, the algorithm will be O(n^2) even in
> the best case).

> Unless there's a specific reason not to, it may be better to simply
> replace all uses of that file with std::sort() from <algorithm>.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: critical | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * priority: blocker => critical

Comment:

Bundled backport to relbranch70 in r61471. Downgrading priority.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------
Changes (by neteler):

  * priority: critical => major

Comment:

Downgrading priority (or to be closed?).

Replying to [comment:3 neteler]:
> For the record, from grass-dev:
>
> On Sun, Jul 27, 2014 at 1:58 AM, Glynn Clements wrote:
> > Cases which haven't been changed are:
>
> > lib/raster3d/test/test_put_get_value_large_file.c. This appears to be
> > a test case; does it matter?
>
> > include/iostream/quicksort.h uses random() or rand() to select a pivot
> > for the quicksort algorithm. That file has no dependency on lib/gis
> > (or anything else, except for <stdlib.h> for random/rand), and I
> > didn't want to add one unnecessarily.
>
> > Again, this shouldn't affect the result, but there may be performance
> > issues if the size of the array being sorted is significantly larger
> > than RAND_MAX (in this situation, the algorithm will be O(n^2) even in
> > the best case).
>
> > Unless there's a specific reason not to, it may be better to simply
> > replace all uses of that file with std::sort() from <algorithm>.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by neteler):

By chance I found

raster/simwe/simlib/random.c

/* uniform random number generator (combined type) */

Should that be replaced as well (how)?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by glynn):

Replying to [comment:6 neteler]:
> By chance I found
>
> raster/simwe/simlib/random.c
>
> /* uniform random number generator (combined type) */
>
> Should that be replaced as well (how)?

ulec() can be replaced by G_drand48().

seedg() isn't used.

seeds() is always called with fixed values (12345 and 67891). Either call
G_srand48() with a fixed value (if repeatability is important), or call
G_srand48_auto() for a non-deterministic seed (if it isn't), or add a
seed= option to the modules.

FWIW, it seems possible for ulec() to return values slightly greater than
1:

{{{
     ret_val = (double)iz *4.656613e-10;

     return ret_val;
}}}

(2^31^-1) * 4.656613e-10 = 1.0000000267907612

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:7&gt;
GRASS GIS <http://grass.osgeo.org>

#2272: Improve consistency in random generator usage
-------------------------+--------------------------------------------------
Reporter: neteler | Owner: grass-dev@…
     Type: defect | Status: new
Priority: major | Milestone: 7.0.0
Component: Default | Version: svn-releasebranch70
Keywords: random | Platform: Unspecified
      Cpu: Unspecified |
-------------------------+--------------------------------------------------

Comment(by annakrat):

Replying to [comment:7 glynn]:
> Replying to [comment:6 neteler]:
> > By chance I found
> >
> > raster/simwe/simlib/random.c
> >
> > /* uniform random number generator (combined type) */
> >
> > Should that be replaced as well (how)?
>
> ulec() can be replaced by G_drand48().
>
> seedg() isn't used.
>
> seeds() is always called with fixed values (12345 and 67891). Either
call G_srand48() with a fixed value (if repeatability is important), or
call G_srand48_auto() for a non-deterministic seed (if it isn't), or add a
seed= option to the modules.

Done in r63216 and backported. The seed is fixed value.

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2272#comment:8&gt;
GRASS GIS <http://grass.osgeo.org>