[GRASS-dev] Re: [bug #3112] r.watershed hangs

Hardeep Singh Rai wrote:

I was doing watershed analysis for spearfish dataset
(spearfish_grass60data-0.1.tar.gz), and on giving command:

r.watershed elevation=elevation.dem threshhold=3 basin=raiBasin
accumulation=raiAccu it starts processing of 5 steps. After about 20
minutes, it completes all and throw message: "cloing maps", but do
nothing; prompt never comes. I used it with option -m and without it.
With both cases, "top" commands show ram or sed running with 99 usage
of CPU. Even I waited for 12 hours, but it fail to finish. I am using
GRASS 6.0.1 on Ubuntu.

Please suggest some solution.

Hamish wrote:

It's a known bug (#3112). It gets stuck trying to make a huge set of
colormap rules for the new map. If you cancel (^C) you might find the
map was created anyway (???).

Chuck wrote:

The solution is to include a more realistic threshold size.

Yes.

The attached patch shows the problem quite well. As the treshold gets
smaller, the number of areas (cats) grows exponentially. Not only that,
but as the loop within a loop runs, it takes longer and longer to do the
same thing.

The inner loop gets slower I think due to the increased number of color
rules set with G_set_color() that have to be read each time.

#spearfish
g.region rast=elevation.dem
r.watershed elev=elevation.dem basin=tmp_basin thresh=200
d.rast tmp_basin

thresh=200 # 1488 cats
real 0m38.558s
user 0m37.770s
sys 0m0.110s

thresh=100 # 2858 cats
real 0m41.296s
user 0m41.050s
sys 0m0.120s

thresh=50 # 5482 cats
real 0m57.495s
user 0m55.870s
sys 0m0.180s

thresh=37 # 7258 cats
real 1m20.985s
user 1m20.340s
sys 0m0.200s

thresh=25 # 10648 cats
real 3m5.185s
user 3m4.020s
sys 0m0.280s

there are a lot of loops here, but I think the main problem is with so
many calls to G_{get,set}_color():

raster/r.watershed/ram/close_maps2.c
..
        G_init_colors(&colors);
        G_make_random_colors(&colors, 1, max);
        G_set_color((CELL)0, 0, 0, 0, &colors);
        r = 1;
        incr = 0;
        while(incr >= 0) {
            for (gr=130+incr; gr<=255; gr += 20) {
                for (rd=90+incr; rd<=255; rd += 30) {
                    for (bl=90+incr; bl<=255; bl += 40) {
                        flag = 1;
                        while (flag) {
                            G_get_color(r,&red,&green,&blue, &colors);
                            if((blue*.11 + red*.30 + green*.59) < 100) {

                                G_set_color(r, rd, gr, bl, &colors);
                                flag = 0;
                            }
                            if (++r > max) {
                                gr = rd = bl = 300;
                                flag = 0;
                                incr = -1;
                            }
                        }
                    }
                }
            }
            if (incr >= 0) {
                incr += 15;
                if (incr > 120)
                    incr = 7;
            }
        }

Hamish

(attachments)

rwaters_loop.diff (610 Bytes)

Dear Hamish,

Thanks for posting the patch of r.watershed. That snippet of code ensures
that colors allocated to watersheds are as different as possible. Back in
the early '90s, we were so worried about RAM use that we wrote convoluted
code that prevented making new arrays whenever possible.

The simpliest fix would be to insert a conditional after
G_make_random_colors(&colors, 1, max) as shown in the patch below (two lines
added). From a visualization perspective, 1000 basins are too many basins to
matter anyway, so 1000 might be good. If you look at the experiment you
performed, you can see 10000 basins took 3 minutes to run.

The best fix would be to write an algorithm that chooses category color
based on adjacent cell values. Here is a psuedo code solution:

random_colors( Raster map, Color colors) {
    long adjacentColors = new long[ colors.length];
    for( int r = 0; r < map.rows(); r++) {
       for( int c = 0; c < map.cols(); c++) {
          long cellValue = map.getValue( r, c);
          if( c < map.cols() - 1) {
             if( map.getValue( r, c + 1) < cellValue) {
                // add map.getValue( r, c + 1) to adjacentColors[
cellValue] if not already in array
             }
          }
          if( r < map.rows() - 1) {
             if( map.getValue( r + 1, c ) < cellValue) {
                // add map.getValue( r + 1, c) to adjacentColors[
cellValue] if not already in array
             }
          }
          if( (r < map.rows() - 1) && c < map.cols() - 1) {
             if( map.getValue( r + 1, c + 1 ) < cellValue) {
                // add map.getValue( r + 1, c + 1) to adjacentColors[
cellValue] if not already in array
             }
          }
       }
    }
    for( int i = 0; i < colors.length; i++) {
       // if no adjacentColors[i] == null, give any random color
       // otherwise randomly create 100 colors
       // choose color furthest from colors of adjacentColors[i] array
    }
}

Sincerely, chuck

-----Original Message-----
From: Hamish [mailto:hamish_nospam@yahoo.com]
Sent: Wednesday, October 25, 2006 3:01 AM
To: grass5
Cc: hardeep@eml.cc; grass-bugs@intevation.de; c.ehlschlaeger@insightbb.com
Subject: Re: [bug #3112] r.watershed hangs

Hardeep Singh Rai wrote:

I was doing watershed analysis for spearfish dataset
(spearfish_grass60data-0.1.tar.gz), and on giving command:

r.watershed elevation=elevation.dem threshhold=3 basin=raiBasin
accumulation=raiAccu it starts processing of 5 steps. After about 20
minutes, it completes all and throw message: "cloing maps", but do
nothing; prompt never comes. I used it with option -m and without it.
With both cases, "top" commands show ram or sed running with 99 usage
of CPU. Even I waited for 12 hours, but it fail to finish. I am using
GRASS 6.0.1 on Ubuntu.

Please suggest some solution.

Hamish wrote:

It's a known bug (#3112). It gets stuck trying to make a huge set of
colormap rules for the new map. If you cancel (^C) you might find the
map was created anyway (???).

Chuck wrote:

The solution is to include a more realistic threshold size.

Yes.

The attached patch shows the problem quite well. As the treshold gets
smaller, the number of areas (cats) grows exponentially. Not only that,
but as the loop within a loop runs, it takes longer and longer to do the
same thing.

The inner loop gets slower I think due to the increased number of color
rules set with G_set_color() that have to be read each time.

#spearfish
g.region rast=elevation.dem
r.watershed elev=elevation.dem basin=tmp_basin thresh=200
d.rast tmp_basin

thresh=200 # 1488 cats
real 0m38.558s
user 0m37.770s
sys 0m0.110s

thresh=100 # 2858 cats
real 0m41.296s
user 0m41.050s
sys 0m0.120s

thresh=50 # 5482 cats
real 0m57.495s
user 0m55.870s
sys 0m0.180s

thresh=37 # 7258 cats
real 1m20.985s
user 1m20.340s
sys 0m0.200s

thresh=25 # 10648 cats
real 3m5.185s
user 3m4.020s
sys 0m0.280s

there are a lot of loops here, but I think the main problem is with so
many calls to G_{get,set}_color():

raster/r.watershed/ram/close_maps2.c
..
        G_init_colors(&colors);
        G_make_random_colors(&colors, 1, max);
// new code next line
        if( max < 10000) {
        G_set_color((CELL)0, 0, 0, 0, &colors);
        r = 1;
        incr = 0;
        while(incr >= 0) {
            for (gr=130+incr; gr<=255; gr += 20) {
                for (rd=90+incr; rd<=255; rd += 30) {
                    for (bl=90+incr; bl<=255; bl += 40) {
                        flag = 1;
                        while (flag) {
                            G_get_color(r,&red,&green,&blue, &colors);
                            if((blue*.11 + red*.30 + green*.59) < 100) {

                                G_set_color(r, rd, gr, bl, &colors);
                                flag = 0;
                            }
                            if (++r > max) {
                                gr = rd = bl = 300;
                                flag = 0;
                                incr = -1;
                            }
                        }
                    }
                }
            }
            if (incr >= 0) {
                incr += 15;
                if (incr > 120)
                    incr = 7;
            }
        }// new code next line
        }

Hamish

Charles Ehlschlaeger wrote:

Thanks for posting the patch of r.watershed. That snippet of code
ensures that colors allocated to watersheds are as different as
possible. Back in the early '90s, we were so worried about RAM use
that we wrote convoluted code that prevented making new arrays
whenever possible.

IMO this is why GRASS has (in general) scaled so well for today's
massive datasets. Raster overheads are very low.

The simpliest fix would be to insert a conditional after
G_make_random_colors(&colors, 1, max) as shown in the patch below (two
lines added).

thanks, applied to the 6.2 & 6.3(HEAD) branches.

Also took the opportunity to slop on lots of metadata in the output
maps (6.3cvs only).

From a visualization perspective, 1000 basins are too
many basins to matter anyway, so 1000 might be good. If you look at
the experiment you performed, you can see 10000 basins took 3 minutes
to run.

I went with 10000.

The best fix would be to write an algorithm that chooses category
color based on adjacent cell values. Here is a psuedo code solution:

random_colors( Raster map, Color colors) {
    long adjacentColors = new long[ colors.length];
    for( int r = 0; r < map.rows(); r++) {
       for( int c = 0; c < map.cols(); c++) {
          long cellValue = map.getValue( r, c);
          if( c < map.cols() - 1) {
             if( map.getValue( r, c + 1) < cellValue) {
                // add map.getValue( r, c + 1) to adjacentColors[
cellValue] if not already in array
             }
          }
          if( r < map.rows() - 1) {
             if( map.getValue( r + 1, c ) < cellValue) {
                // add map.getValue( r + 1, c) to adjacentColors[
cellValue] if not already in array
             }
          }
          if( (r < map.rows() - 1) && c < map.cols() - 1) {
             if( map.getValue( r + 1, c + 1 ) < cellValue) {
                // add map.getValue( r + 1, c + 1) to adjacentColors[
cellValue] if not already in array
             }
          }
       }
    }
    for( int i = 0; i < colors.length; i++) {
       // if no adjacentColors[i] == null, give any random color
       // otherwise randomly create 100 colors
       // choose color furthest from colors of adjacentColors[i] array
       
    }
}

The random case doesn't do too poorly. (256^3 color possibilities vs.
the four-color theorem)

// choose color furthest from colors of adjacentColors[i] array

color distance in 3D RGB space isn't something I understand how to do
well. (I've wondered for a while how to find the "opposite" RGB color:
think of it as a vector ray from the origin inside a 3D sphere?)

Anyway, I'll have to leave the "better way" for someone who understands
color-space a bit better than I do.

thanks,
Hamish

On Wednesday 25 October 2006 22:59, Hamish wrote:

Charles Ehlschlaeger wrote:
> Thanks for posting the patch of r.watershed. That snippet of code
> ensures that colors allocated to watersheds are as different as
> possible. Back in the early '90s, we were so worried about RAM use
> that we wrote convoluted code that prevented making new arrays
> whenever possible.

IMO this is why GRASS has (in general) scaled so well for today's
massive datasets. Raster overheads are very low.

> The simpliest fix would be to insert a conditional after
> G_make_random_colors(&colors, 1, max) as shown in the patch below (two
> lines added).

thanks, applied to the 6.2 & 6.3(HEAD) branches.

Also took the opportunity to slop on lots of metadata in the output
maps (6.3cvs only).

> From a visualization perspective, 1000 basins are too
> many basins to matter anyway, so 1000 might be good. If you look at
> the experiment you performed, you can see 10000 basins took 3 minutes
> to run.

I went with 10000.

> The best fix would be to write an algorithm that chooses category
> color based on adjacent cell values. Here is a psuedo code solution:
>
> random_colors( Raster map, Color colors) {
> long adjacentColors = new long[ colors.length];
> for( int r = 0; r < map.rows(); r++) {
> for( int c = 0; c < map.cols(); c++) {
> long cellValue = map.getValue( r, c);
> if( c < map.cols() - 1) {
> if( map.getValue( r, c + 1) < cellValue) {
> // add map.getValue( r, c + 1) to adjacentColors[
> cellValue] if not already in array
> }
> }
> if( r < map.rows() - 1) {
> if( map.getValue( r + 1, c ) < cellValue) {
> // add map.getValue( r + 1, c) to adjacentColors[
> cellValue] if not already in array
> }
> }
> if( (r < map.rows() - 1) && c < map.cols() - 1) {
> if( map.getValue( r + 1, c + 1 ) < cellValue) {
> // add map.getValue( r + 1, c + 1) to adjacentColors[
> cellValue] if not already in array
> }
> }
> }
> }
> for( int i = 0; i < colors.length; i++) {
> // if no adjacentColors[i] == null, give any random color
> // otherwise randomly create 100 colors
> // choose color furthest from colors of adjacentColors[i] array
>
> }
> }

The random case doesn't do too poorly. (256^3 color possibilities vs.
the four-color theorem)

> // choose color furthest from colors of adjacentColors[i] array

color distance in 3D RGB space isn't something I understand how to do
well. (I've wondered for a while how to find the "opposite" RGB color:
think of it as a vector ray from the origin inside a 3D sphere?)

Anyway, I'll have to leave the "better way" for someone who understands
color-space a bit better than I do.

thanks,
Hamish

I don't have anything constructive to add to this thread other than - This
small transaction between various parties, referencing events from the 90's,
dealing with current bugs, the speed in which a fix was suggested and the
speed in which it was committed to CVS astounds me!

This would be a great bit of text to bring up in a presentation on GRASS and
the FOSS concept of software development.

Great job everyone!

Dylan

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

Charles Ehlschlaeger wrote:
> The best fix would be to write an algorithm that chooses category
> color based on adjacent cell values. Here is a psuedo code solution:

Hamish wrote:

The random case doesn't do too poorly. (256^3 color possibilities vs.
the four-color theorem)

minor correction:

lib/gis/color_rand.c defines:
MAX_COLORS 1024

(the code makes that 960 +- 64 in practice)

but that 1024 color palette is made up of RGB triplets from 256^3 space.

Hamish