[GRASS-user] r.watershed speed-up

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of <SECTION 2: A * Search> considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). <SECTION 4: Watershed Determination> takes now the longest.

Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical.

The original code was only altered where necessary, only the sorting method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram).

My question is if there is interest in this faster version of r.watershed and if someone wants to test it.
The source code is available on http://markus.metz.giswork.googlepages.com/

Regards,

Markus

Markus,

Thank you very much! I have not tested it yet, but after I do I'll write you with my impressions. I think this could be a very valuable contribution.

Regards,
Tom

Markus Metz wrote:

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of <SECTION 2: A * Search> considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). <SECTION 4: Watershed Determination> takes now the longest.

Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical.

The original code was only altered where necessary, only the sorting method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram).

My question is if there is interest in this faster version of r.watershed and if someone wants to test it.
The source code is available on http://markus.metz.giswork.googlepages.com/

Regards,

Markus

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL: thomas.adams@noaa.gov

VOICE: 937-383-0528
FAX: 937-383-0033

First results of one test I did: excellent!!!

1392776 cells dem with steep slopes but also plain areas

r.watershed.fast: 7 seconds
r.watershed: 4 min and 14 seconds

no difference at all between the two generated network paths...

markus: many thanks..

Ivan

Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto:

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m
flag set, r.watershed.ram), I implemented a binary min-heap instead of a
linear array to store points in ascending order relative to elevation.
This improves the speed of <SECTION 2: A * Search> considerably, the
larger the map, the larger the speed improvement. A region with about
1,800,000 cells is processed about 37 times faster than with the
original routine. A region with about 11,000,000 cells is processed
about 170 times faster than with the original routine (46.7sec vs.
2h12m9sec on my system). <SECTION 4: Watershed Determination> takes now
the longest.

Several tests (different region sizes, different DEMs) showed that the
results of the binary heap version are very similar but not 100%
identical to the original r.watershed, because of a slightly different
treatment of cells with equal elevation in a binary min-heap compared to
a linear array. Differences are found in difficult areas (equal
elevation of several neighbouring cells) where there are several
possible solutions for how water could flow. Still, compared with other
hydrology analysis methods, e.g. r.terraflow, the results can be
regarded as identical.

The original code was only altered where necessary, only the sorting
method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index
for each analysed point (in addition to the other indices already needed
by r.watershed.ram).

My question is if there is interest in this faster version of
r.watershed and if someone wants to test it.
The source code is available on http://markus.metz.giswork.googlepages.com/

Regards,

Markus

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

--
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Department of Civil and Environmental Engineering
University of Perugia
Via G. Duranti 93/a
06125
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: marchesini@unipg.it
        ivan.marchesini@gmail.com
tel: +39(0)755853760
fax (university): +39(0)755853756
fax (home): +39(0)5782830887
jabber: geoivan73@jabber.org

Ivan anticipated me for a bit.
GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
the time but it was less then 2 minutes, while with r.watershed I got
stalled.
Analyzing the differences between the r.watershed and r.watershed.fast
for a narrower region, there was some subtle, sparse differences
(1,-1,7) meaning 45° differences:

The following are the differences values (from r.mapcalc) statistics
(number of pixels for values)

-7 76
-6 34
-5 30
-4 49
-3 43
-2 194
-1 1165
0 813894
1 1218
2 253
3 83
4 45
5 25
6 93
7 250
8 22
9 21
10 5
11 15
12 30
13 18
14 3
15 5
16 4
* 36917172

2008/7/29 ivan marchesini <marchesini@unipg.it>:

First results of one test I did: excellent!!!

1392776 cells dem with steep slopes but also plain areas

r.watershed.fast: 7 seconds
r.watershed: 4 min and 14 seconds

no difference at all between the two generated network paths...

markus: many thanks..

Ivan

Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto:

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m
flag set, r.watershed.ram), I implemented a binary min-heap instead of a
linear array to store points in ascending order relative to elevation.
This improves the speed of <SECTION 2: A * Search> considerably, the
larger the map, the larger the speed improvement. A region with about
1,800,000 cells is processed about 37 times faster than with the
original routine. A region with about 11,000,000 cells is processed
about 170 times faster than with the original routine (46.7sec vs.
2h12m9sec on my system). <SECTION 4: Watershed Determination> takes now
the longest.

Several tests (different region sizes, different DEMs) showed that the
results of the binary heap version are very similar but not 100%
identical to the original r.watershed, because of a slightly different
treatment of cells with equal elevation in a binary min-heap compared to
a linear array. Differences are found in difficult areas (equal
elevation of several neighbouring cells) where there are several
possible solutions for how water could flow. Still, compared with other
hydrology analysis methods, e.g. r.terraflow, the results can be
regarded as identical.

The original code was only altered where necessary, only the sorting
method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index
for each analysed point (in addition to the other indices already needed
by r.watershed.ram).

My question is if there is interest in this faster version of
r.watershed and if someone wants to test it.
The source code is available on http://markus.metz.giswork.googlepages.com/

Regards,

Markus

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

--
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Department of Civil and Environmental Engineering
University of Perugia
Via G. Duranti 93/a
06125
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: marchesini@unipg.it
       ivan.marchesini@gmail.com
tel: +39(0)755853760
fax (university): +39(0)755853756
fax (home): +39(0)5782830887
jabber: geoivan73@jabber.org

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

A correction: as you can see from the stats, the differences are not
only (1,-1,7) as I was saying... Now I have to check how much these
differences affect the results. Anyway, above 99.5 % of values are
equal.

2008/7/29 G. Allegri <giohappy@gmail.com>:

Ivan anticipated me for a bit.
GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
the time but it was less then 2 minutes, while with r.watershed I got
stalled.
Analyzing the differences between the r.watershed and r.watershed.fast
for a narrower region, there was some subtle, sparse differences
(1,-1,7) meaning 45° differences:

The following are the differences values (from r.mapcalc) statistics
(number of pixels for values)

-7 76
-6 34
-5 30
-4 49
-3 43
-2 194
-1 1165
0 813894
1 1218
2 253
3 83
4 45
5 25
6 93
7 250
8 22
9 21
10 5
11 15
12 30
13 18
14 3
15 5
16 4
* 36917172

2008/7/29 ivan marchesini <marchesini@unipg.it>:

First results of one test I did: excellent!!!

1392776 cells dem with steep slopes but also plain areas

r.watershed.fast: 7 seconds
r.watershed: 4 min and 14 seconds

no difference at all between the two generated network paths...

markus: many thanks..

Ivan

Il giorno lun, 28/07/2008 alle 23.14 +0200, Markus Metz ha scritto:

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m
flag set, r.watershed.ram), I implemented a binary min-heap instead of a
linear array to store points in ascending order relative to elevation.
This improves the speed of <SECTION 2: A * Search> considerably, the
larger the map, the larger the speed improvement. A region with about
1,800,000 cells is processed about 37 times faster than with the
original routine. A region with about 11,000,000 cells is processed
about 170 times faster than with the original routine (46.7sec vs.
2h12m9sec on my system). <SECTION 4: Watershed Determination> takes now
the longest.

Several tests (different region sizes, different DEMs) showed that the
results of the binary heap version are very similar but not 100%
identical to the original r.watershed, because of a slightly different
treatment of cells with equal elevation in a binary min-heap compared to
a linear array. Differences are found in difficult areas (equal
elevation of several neighbouring cells) where there are several
possible solutions for how water could flow. Still, compared with other
hydrology analysis methods, e.g. r.terraflow, the results can be
regarded as identical.

The original code was only altered where necessary, only the sorting
method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index
for each analysed point (in addition to the other indices already needed
by r.watershed.ram).

My question is if there is interest in this faster version of
r.watershed and if someone wants to test it.
The source code is available on http://markus.metz.giswork.googlepages.com/

Regards,

Markus

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

--
Ti prego di cercare di non inviarmi files .dwg, .doc, .xls, .ppt.
Preferisco formati liberi.
Please try to avoid to send me .dwg, .doc, .xls, .ppt files.
I prefer free formats.
http://it.wikipedia.org/wiki/Formato_aperto
http://en.wikipedia.org/wiki/Open_format

Ivan Marchesini
Department of Civil and Environmental Engineering
University of Perugia
Via G. Duranti 93/a
06125
Perugia (Italy)
Socio fondatore GFOSS "Geospatial Free and Open Source Software" http://www.gfoss.it
e-mail: marchesini@unipg.it
       ivan.marchesini@gmail.com
tel: +39(0)755853760
fax (university): +39(0)755853756
fax (home): +39(0)5782830887
jabber: geoivan73@jabber.org

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Yes, there are some differences. Please look at where these differences are located and if they are significantly changing the results. In flat areas, there will be several possibilities about how water could flow. This might also affect the flowpath further upstream. Both the original version and the fast version have a certain degree of randomness on how to find a path through flat terrain, but it's a different type randomness, so to speak. The fast version can not exactly replicate the results, but it is pretty close. If there are only slight shifts in the flowpath and these differences can be justified by comparing them to the input DEM, the results could be useful. I think of particular interest are larger differences in flow accumulation, because they have a big influence on other output like basins and flow direction.

Thanks for testing!

G. Allegri wrote:

A correction: as you can see from the stats, the differences are not
only (1,-1,7) as I was saying... Now I have to check how much these
differences affect the results. Anyway, above 99.5 % of values are
equal.

2008/7/29 G. Allegri <giohappy@gmail.com>:
  

Ivan anticipated me for a bit.
GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
the time but it was less then 2 minutes, while with r.watershed I got
stalled.
Analyzing the differences between the r.watershed and r.watershed.fast
for a narrower region, there was some subtle, sparse differences
(1,-1,7) meaning 45° differences:

The following are the differences values (from r.mapcalc) statistics
(number of pixels for values)

-7 76
-6 34
-5 30
-4 49
-3 43
-2 194
-1 1165
0 813894
1 1218
2 253
3 83
4 45
5 25
6 93
7 250
8 22
9 21
10 5
11 15
12 30
13 18
14 3
15 5
16 4
* 36917172

On 28/07/08 23:14, Markus Metz wrote:

Hello list,

I'm not sure if this list is the right place or rather the developer list.
For the A * Search algorithm in r.watershed (memory version without -m flag set, r.watershed.ram), I implemented a binary min-heap instead of a linear array to store points in ascending order relative to elevation. This improves the speed of <SECTION 2: A * Search> considerably, the larger the map, the larger the speed improvement. A region with about 1,800,000 cells is processed about 37 times faster than with the original routine. A region with about 11,000,000 cells is processed about 170 times faster than with the original routine (46.7sec vs. 2h12m9sec on my system). <SECTION 4: Watershed Determination> takes now the longest.

Several tests (different region sizes, different DEMs) showed that the results of the binary heap version are very similar but not 100% identical to the original r.watershed, because of a slightly different treatment of cells with equal elevation in a binary min-heap compared to a linear array. Differences are found in difficult areas (equal elevation of several neighbouring cells) where there are several possible solutions for how water could flow. Still, compared with other hydrology analysis methods, e.g. r.terraflow, the results can be regarded as identical.

The original code was only altered where necessary, only the sorting method is new, everything else is unchanged.
Memory usage increases a bit, because a binary heap needs its own index for each analysed point (in addition to the other indices already needed by r.watershed.ram).

My question is if there is interest in this faster version of r.watershed and if someone wants to test it.

I think there definitely is interest (see the number of mails about the long running time of r.watershed). The important issue will be the quality of results.

First test in North Carolina demo data set:

g.region rast=elevation

time r.watershed elevation=elevation@PERMANENT accumulation=old_accum drainage=old_dir basin=old_sheds stream=old_streams thresh=500

real 19m2.744s
user 18m41.318s
sys 0m1.884s

time r.watershed.fast elevation=elevation@PERMANENT accumulation=fast_accum drainage=fast_dir basin=fast_sheds stream=fast_streams thresh=500

real 0m18.034s
user 0m17.833s
sys 0m0.196s

Impressive !

Comparison of watersheds is a bit difficult because of different cat numbers and colors...but the two seem fairly similar.

Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 98%. Those which have different directions are overwhelmingly low slope cells.

1833907 cells have the same accumulation value, i.e. 90%, but I guess this is to be expected.

I'll look at the details of the differences later.

Moritz

Moritz

Dear Markus and other r.watershed enthusiasts,
I've thought of a way to make your faster version of r.watershed give
identical results as the older r.watershed. r.watershed.old uses a breadth
first search in section 2. r.watershed.fast is breadth first <when the DEM
values are different>. The trick would be to slightly adjust DEM values by
when they were added to the heap. Here is some pseudo-code
cellsAddedToHeap = 0
minCellIncrement = Double.MIN_VALUE # (java) This constant represents
                                    # the smallest increment a CELL
                                    # can be, if the CELLs are doubles.
# At the time a cell is added to heap, the DEM value placed
# in the heap would be:
DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++))

Sincerely, chuck

-----Original Message-----
From: grass-user-bounces@lists.osgeo.org
[mailto:grass-user-bounces@lists.osgeo.org] On Behalf Of Markus Metz
Sent: Tuesday, July 29, 2008 10:44 AM
To: G. Allegri
Cc: grass-user@lists.osgeo.org
Subject: Re: [GRASS-user] r.watershed speed-up

Yes, there are some differences. Please look at where these differences
are located and if they are significantly changing the results. In flat
areas, there will be several possibilities about how water could flow.
This might also affect the flowpath further upstream. Both the original
version and the fast version have a certain degree of randomness on how
to find a path through flat terrain, but it's a different type
randomness, so to speak. The fast version can not exactly replicate the
results, but it is pretty close. If there are only slight shifts in the
flowpath and these differences can be justified by comparing them to the
input DEM, the results could be useful. I think of particular interest
are larger differences in flow accumulation, because they have a big
influence on other output like basins and flow direction.

Thanks for testing!

G. Allegri wrote:

A correction: as you can see from the stats, the differences are not
only (1,-1,7) as I was saying... Now I have to check how much these
differences affect the results. Anyway, above 99.5 % of values are
equal.

2008/7/29 G. Allegri <giohappy@gmail.com>:
  

Ivan anticipated me for a bit.
GREAT Markus! I could crunch Sardinia in one shot. I haven't measured
the time but it was less then 2 minutes, while with r.watershed I got
stalled.
Analyzing the differences between the r.watershed and r.watershed.fast
for a narrower region, there was some subtle, sparse differences
(1,-1,7) meaning 45° differences:

The following are the differences values (from r.mapcalc) statistics
(number of pixels for values)

-7 76
-6 34
-5 30
-4 49
-3 43
-2 194
-1 1165
0 813894
1 1218
2 253
3 83
4 45
5 25
6 93
7 250
8 22
9 21
10 5
11 15
12 30
13 18
14 3
15 5
16 4
* 36917172

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Dear Chuck,

r.watershed is a much valued tool in GRASS, for me the best watershed analsis tool not only in GRASS, therefore I thought about a a way to keep the results identical too. I am also aware that the closer the results produced by changes in the algorithm are to the results produced by original algorithm, the higher the chances that it will be accepted by the community.

With regard to your suggestion, I would not adjust DEM values, because in larger regions the minimum possible increment is already there in the data, i.e. there are no gaps in the data distribution that can be filled with adjusted values. One theoretical way out would be to read in DEMs as FCELL or DCELL, but then there is the floating point comparison problem. (I tried against better knowledge, it doesn't work). Regarding the breadth first search, where do you see breadth first <when the DEM values are different>? You lost me there. I don't see differences in how points are searched between the two versions, but maybe I have not fully understood the original algorithm. As far as I have understood the original algorithm, the list of astar_pts following astar_pts.nxt is kept in ascending order using elevation. If there are already points with equal elevation, the new point is inserted after all other points with the same elevation (line 91 in original do_astar.c), so that the point inserted first (of several points with equal elevation) will be removed first (line 19 in original do_astar.c). This is still the case in the new algorithm (insertion: line 136, removal: lines 31 and 192) most of the time. If the binary heap becomes fairly large and there are many points with equal elevation, there might be an exception. Please let me know if I got something wrong there!

Another possibility to produce the exact same results like in the original version would be to go recursively down the heap and pick the point added earliest from all points with elevation equal to the root point. This is easy to implement, but it would have slowed down the search algorithm somewhat and I wanted to get something lightening fast.

I have one main argument why it is not a disaster if the results are not 100% identical:
The order in which neighbouring cells are added is in both versions, with respect to the focus cell:
low, up, left, right, upper right corner, lower left corner, lower right corner, upper left corner
This order is always kept, irrespective of the already established flow direction, thus it is a random order and there is not really a reason why the algorithm should stick to that order. I think a rare replacement of that random order (2% difference of flow direction in Moritz Lennert's test) with another random order (binary heap shuffling) is not a disaster and the result is still valid. I did build in a check to make results more similar, but there are still scenarios when this check doesn't catch.

So my main question to you, the original author of r.watershed, is, if a rare violation to the (in my opinion random) order in which neighbours are added to the list would cause the results to be no longer valid.
The other question is if I should provide now a version that really produces identical results, or if I first sort out the problem of how neighbours are (should be) added to and removed from the list. BTW, I tried to change the order of adding neighbours to the list too, taking into account the already established flow direction. It produces very straight lines in flat terrain, which is ok in hydrological terms, but some randomness looked better. Flat terrain in the DEM must not be flat in reality because of problems with DEM resolution and accuracy, randomness produces there more naturally looking results.

Sorry for the long reply!

Regards,

Markus

Charles Ehlschlaeger wrote:

Dear Markus and other r.watershed enthusiasts,
I've thought of a way to make your faster version of r.watershed give
identical results as the older r.watershed. r.watershed.old uses a breadth
first search in section 2. r.watershed.fast is breadth first <when the DEM
values are different>. The trick would be to slightly adjust DEM values by
when they were added to the heap. Here is some pseudo-code
cellsAddedToHeap = 0
minCellIncrement = Double.MIN_VALUE # (java) This constant represents # the smallest increment a CELL # can be, if the CELLs are doubles.
# At the time a cell is added to heap, the DEM value placed # in the heap would be:
DemOfCellToHeap = DEM + (minCellIncrement * cellsAddedToHeap++))

Sincerely, chuck
  

Hello list,

there is now a new version of r.watershed.fast where results are even more similar to r.watershed. They are still not 100% identical to r.watershed, but I can't get it more similar. But it comes with a further speed increase. I repeated the test of Moritz with the same commands on GRASS 6.4.svn, results are below.

Moritz Lennert wrote:

First test in North Carolina demo data set:

g.region rast=elevation

time r.watershed elevation=elevation@PERMANENT accumulation=old_accum drainage=old_dir basin=old_sheds stream=old_streams thresh=500

real 19m2.744s
user 18m41.318s
sys 0m1.884s

time r.watershed.fast elevation=elevation@PERMANENT accumulation=fast_accum drainage=fast_dir basin=fast_sheds stream=fast_streams thresh=500

real 0m18.034s
user 0m17.833s
sys 0m0.196s

Absolute times are not really comparable between systems, but relative differences in time should be similar. The following numbers are calculated with real time. In the test Moritz did, r.watershed took 63x as long as r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of r.watershed.
New version: r.watershed took 127x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.8% of the time of r.watershed.

Of the 2025000 cells in the map, 1991218 show the same direction, i.e. 98%. Those which have different directions are overwhelmingly low slope cells.

New version: 2004480 cells, i.e. 99% of all cells show the same flow direction.

1833907 cells have the same accumulation value, i.e. 90%, but I guess this is to be expected.

New version: 1921510 cells, i.e. 95% of all cells show the same accumulation value.

The idea is that a faster r.watershed can also be used for massive grids, where GRASS users frequently gave up using r.watershed because it would have taken hours or even days. I resampled "elevation" in the North Carolina demo data set from 10m to 3m with r.resamp.rst using default values (after the GRASS book Section 5.3.3, paragraph "Regularized spline with tension (RST) interpolation") to generate a fairly large map and ran the same test on the resampled map.

cells in region : 22,500,000

The results:

Speed:
r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10 hours versus 1 minute...).

Flow direction differences:
22288539 cells, i.e. 99% of all cells show the same flow direction.

Flow accumulation differences:
20963653 cells, i.e. 93% of all cells show the same accumulation value.

Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
I don't understand why memory usage increases after <SECTION 1a: Initiating Memory> is completed.
Assuming that there is no longer a time constraint but only a memory constraint (although <SECTION 4: Watershed Determination> can take some time on large maps with a large threshold value), the upper region sizes that r.watershed.fast can process in RAM would be *roughly* for
1GB RAM: 14,000,000 cells
2GB RAM: 38,000,000 cells
4GB RAM: 86,000,000 cells
8GB RAM: 181,000,000 cells
after putting 400MB aside for the system and other open applications. Estimate based on Linux 64bit.

If you want to repeat and analyse the tests with the North Carolina demo data set, the new r.watershed.fast is here http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz and the test script is below.

Regards,

Markus

test script:
g.region rast=elevation
time r.watershed elevation=elevation@PERMANENT accumulation=nc_accum_old drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
time r.watershed.fast elevation=elevation@PERMANENT accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast stream=nc_streams_fast thresh=500
r.mapcalc nc_dir_dif='if(("nc_dir_old" - "nc_dir_fast" != 0),1,0)'
r.mapcalc nc_accum_dif='if(("nc_accum_old" - "nc_accum_fast" != 0),1,0)'
r.stats -c input=nc_dir_dif@PERMANENT
r.stats -c input=nc_accum_dif@PERMANENT

r.resamp.rst input=elevation@PERMANENT ew_res=3 ns_res=3 elev=elevation_rst overlap=3 zmult=1.0 tension=40.
g.region rast=elevation_rst
time r.watershed elevation=elevation_rst@PERMANENT accumulation=nc_rst_accum_old drainage=nc_rst_dir_old basin=nc_rst_sheds_old stream=nc_rst_streams_old thresh=500
time r.watershed.fast elevation=elevation_rst@PERMANENT accumulation=nc_rst_accum_fast drainage=nc_rst_dir_fast basin=nc_rst_sheds_fast stream=nc_rst_streams_fast thresh=500
r.mapcalc nc_rst_dir_dif='if(("nc_rst_dir_old" - "nc_rst_dir_fast" != 0),1,0)'
r.mapcalc nc_rst_accum_dif='if(("nc_rst_accum_old" - "nc_rst_accum_fast" != 0),1,0)'
r.stats -c input=nc_rst_dir_dif@PERMANENT
r.stats -c input=nc_rst_accum_dif@PERMANENT

Great. Things get better and faster!
I've tried on a not so big region. But it was enough:

989861 cells (172286 null cells under MASK, where I have the see)

It has taken about 55 seconds (I don't have time to set up a
profiling), where I asked for drain and visual outputs creation.

The stats are (calculated in OO):

DIFF-VALUES N.PIXELS PERCENTAGE
-7 56 0,0068
-6 30 0,0037
-5 32 0,0039
-4 49 0,0060
-3 38 0,0046
-2 180 0,0220
-1 914 0,1118
0 814650 99,6422
1 910 0,1113
2 229 0,0280
3 79 0,0097
4 42 0,0051
5 21 0,0026
6 70 0,0086
7 152 0,0186
8 22 0,0027
9 21 0,0026
10 5 0,0006
11 15 0,0018
12 30 0,0037
13 18 0,0022
14 3 0,0004
15 5 0,0006
16 4 0,0005

Things has run better then with the previous version, as >99.6% of
values are equal to r.watershed.

Thanks, a lot, for r.watershed.fast! :slight_smile:
Giovanni

2008/8/1 Markus Metz <markus_metz@gmx.de>:

Hello list,

there is now a new version of r.watershed.fast where results are even more
similar to r.watershed. They are still not 100% identical to r.watershed,
but I can't get it more similar. But it comes with a further speed increase.
I repeated the test of Moritz with the same commands on GRASS 6.4.svn,
results are below.

Moritz Lennert wrote:

First test in North Carolina demo data set:

g.region rast=elevation

time r.watershed elevation=elevation@PERMANENT accumulation=old_accum
drainage=old_dir basin=old_sheds stream=old_streams thresh=500

real 19m2.744s
user 18m41.318s
sys 0m1.884s

time r.watershed.fast elevation=elevation@PERMANENT
accumulation=fast_accum drainage=fast_dir basin=fast_sheds
stream=fast_streams thresh=500

real 0m18.034s
user 0m17.833s
sys 0m0.196s

Absolute times are not really comparable between systems, but relative
differences in time should be similar. The following numbers are calculated
with real time. In the test Moritz did, r.watershed took 63x as long as
r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of
r.watershed.
New version: r.watershed took 127x as long as r.watershed.fast, i.e.
r.watershed.fast needed only 0.8% of the time of r.watershed.

Of the 2025000 cells in the map, 1991218 show the same direction, i.e.
98%. Those which have different directions are overwhelmingly low slope
cells.

New version: 2004480 cells, i.e. 99% of all cells show the same flow
direction.

1833907 cells have the same accumulation value, i.e. 90%, but I guess this
is to be expected.

New version: 1921510 cells, i.e. 95% of all cells show the same accumulation
value.

The idea is that a faster r.watershed can also be used for massive grids,
where GRASS users frequently gave up using r.watershed because it would have
taken hours or even days. I resampled "elevation" in the North Carolina demo
data set from 10m to 3m with r.resamp.rst using default values (after the
GRASS book Section 5.3.3, paragraph "Regularized spline with tension (RST)
interpolation") to generate a fairly large map and ran the same test on the
resampled map.

cells in region : 22,500,000

The results:

Speed:
r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast
needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10
hours versus 1 minute...).

Flow direction differences:
22288539 cells, i.e. 99% of all cells show the same flow direction.

Flow accumulation differences:
20963653 cells, i.e. 93% of all cells show the same accumulation value.

Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
I don't understand why memory usage increases after <SECTION 1a: Initiating
Memory> is completed.
Assuming that there is no longer a time constraint but only a memory
constraint (although <SECTION 4: Watershed Determination> can take some time
on large maps with a large threshold value), the upper region sizes that
r.watershed.fast can process in RAM would be *roughly* for
1GB RAM: 14,000,000 cells
2GB RAM: 38,000,000 cells
4GB RAM: 86,000,000 cells
8GB RAM: 181,000,000 cells
after putting 400MB aside for the system and other open applications.
Estimate based on Linux 64bit.

If you want to repeat and analyse the tests with the North Carolina demo
data set, the new r.watershed.fast is here
http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz
and the test script is below.

Regards,

Markus

test script:
g.region rast=elevation
time r.watershed elevation=elevation@PERMANENT accumulation=nc_accum_old
drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
time r.watershed.fast elevation=elevation@PERMANENT
accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast
stream=nc_streams_fast thresh=500
r.mapcalc nc_dir_dif='if(("nc_dir_old" - "nc_dir_fast" != 0),1,0)'
r.mapcalc nc_accum_dif='if(("nc_accum_old" - "nc_accum_fast" != 0),1,0)'
r.stats -c input=nc_dir_dif@PERMANENT
r.stats -c input=nc_accum_dif@PERMANENT

r.resamp.rst input=elevation@PERMANENT ew_res=3 ns_res=3 elev=elevation_rst
overlap=3 zmult=1.0 tension=40.
g.region rast=elevation_rst
time r.watershed elevation=elevation_rst@PERMANENT
accumulation=nc_rst_accum_old drainage=nc_rst_dir_old basin=nc_rst_sheds_old
stream=nc_rst_streams_old thresh=500
time r.watershed.fast elevation=elevation_rst@PERMANENT
accumulation=nc_rst_accum_fast drainage=nc_rst_dir_fast
basin=nc_rst_sheds_fast stream=nc_rst_streams_fast thresh=500
r.mapcalc nc_rst_dir_dif='if(("nc_rst_dir_old" - "nc_rst_dir_fast" !=
0),1,0)'
r.mapcalc nc_rst_accum_dif='if(("nc_rst_accum_old" - "nc_rst_accum_fast" !=
0),1,0)'
r.stats -c input=nc_rst_dir_dif@PERMANENT
r.stats -c input=nc_rst_accum_dif@PERMANENT

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

With the "fast" drainage versions I could reproduce water outlet
basins with unsignificant differences. Apart the time saving :slight_smile:

2008/8/1 G. Allegri <giohappy@gmail.com>:

Great. Things get better and faster!
I've tried on a not so big region. But it was enough:

989861 cells (172286 null cells under MASK, where I have the see)

It has taken about 55 seconds (I don't have time to set up a
profiling), where I asked for drain and visual outputs creation.

The stats are (calculated in OO):

DIFF-VALUES N.PIXELS PERCENTAGE
-7 56 0,0068
-6 30 0,0037
-5 32 0,0039
-4 49 0,0060
-3 38 0,0046
-2 180 0,0220
-1 914 0,1118
0 814650 99,6422
1 910 0,1113
2 229 0,0280
3 79 0,0097
4 42 0,0051
5 21 0,0026
6 70 0,0086
7 152 0,0186
8 22 0,0027
9 21 0,0026
10 5 0,0006
11 15 0,0018
12 30 0,0037
13 18 0,0022
14 3 0,0004
15 5 0,0006
16 4 0,0005

Things has run better then with the previous version, as >99.6% of
values are equal to r.watershed.

Thanks, a lot, for r.watershed.fast! :slight_smile:
Giovanni

2008/8/1 Markus Metz <markus_metz@gmx.de>:

Hello list,

there is now a new version of r.watershed.fast where results are even more
similar to r.watershed. They are still not 100% identical to r.watershed,
but I can't get it more similar. But it comes with a further speed increase.
I repeated the test of Moritz with the same commands on GRASS 6.4.svn,
results are below.

Moritz Lennert wrote:

First test in North Carolina demo data set:

g.region rast=elevation

time r.watershed elevation=elevation@PERMANENT accumulation=old_accum
drainage=old_dir basin=old_sheds stream=old_streams thresh=500

real 19m2.744s
user 18m41.318s
sys 0m1.884s

time r.watershed.fast elevation=elevation@PERMANENT
accumulation=fast_accum drainage=fast_dir basin=fast_sheds
stream=fast_streams thresh=500

real 0m18.034s
user 0m17.833s
sys 0m0.196s

Absolute times are not really comparable between systems, but relative
differences in time should be similar. The following numbers are calculated
with real time. In the test Moritz did, r.watershed took 63x as long as
r.watershed.fast, i.e. r.watershed.fast needed only 1.6% of the time of
r.watershed.
New version: r.watershed took 127x as long as r.watershed.fast, i.e.
r.watershed.fast needed only 0.8% of the time of r.watershed.

Of the 2025000 cells in the map, 1991218 show the same direction, i.e.
98%. Those which have different directions are overwhelmingly low slope
cells.

New version: 2004480 cells, i.e. 99% of all cells show the same flow
direction.

1833907 cells have the same accumulation value, i.e. 90%, but I guess this
is to be expected.

New version: 1921510 cells, i.e. 95% of all cells show the same accumulation
value.

The idea is that a faster r.watershed can also be used for massive grids,
where GRASS users frequently gave up using r.watershed because it would have
taken hours or even days. I resampled "elevation" in the North Carolina demo
data set from 10m to 3m with r.resamp.rst using default values (after the
GRASS book Section 5.3.3, paragraph "Regularized spline with tension (RST)
interpolation") to generate a fairly large map and ran the same test on the
resampled map.

cells in region : 22,500,000

The results:

Speed:
r.watershed took 5459x as long as r.watershed.fast, i.e. r.watershed.fast
needed only 0.02% of the time of r.watershed (here 10h2m55s vs. 1m7s, 10
hours versus 1 minute...).

Flow direction differences:
22288539 cells, i.e. 99% of all cells show the same flow direction.

Flow accumulation differences:
20963653 cells, i.e. 93% of all cells show the same accumulation value.

Memory usage of r.watershed and r.watershed.fast: maximum of about 940MB
I don't understand why memory usage increases after <SECTION 1a: Initiating
Memory> is completed.
Assuming that there is no longer a time constraint but only a memory
constraint (although <SECTION 4: Watershed Determination> can take some time
on large maps with a large threshold value), the upper region sizes that
r.watershed.fast can process in RAM would be *roughly* for
1GB RAM: 14,000,000 cells
2GB RAM: 38,000,000 cells
4GB RAM: 86,000,000 cells
8GB RAM: 181,000,000 cells
after putting 400MB aside for the system and other open applications.
Estimate based on Linux 64bit.

If you want to repeat and analyse the tests with the North Carolina demo
data set, the new r.watershed.fast is here
http://markus.metz.giswork.googlepages.com/r.watershed_fast_version.tar.gz
and the test script is below.

Regards,

Markus

test script:
g.region rast=elevation
time r.watershed elevation=elevation@PERMANENT accumulation=nc_accum_old
drainage=nc_dir_old basin=nc_sheds_old stream=nc_streams_old thresh=500
time r.watershed.fast elevation=elevation@PERMANENT
accumulation=nc_accum_fast drainage=nc_dir_fast basin=nc_sheds_fast
stream=nc_streams_fast thresh=500
r.mapcalc nc_dir_dif='if(("nc_dir_old" - "nc_dir_fast" != 0),1,0)'
r.mapcalc nc_accum_dif='if(("nc_accum_old" - "nc_accum_fast" != 0),1,0)'
r.stats -c input=nc_dir_dif@PERMANENT
r.stats -c input=nc_accum_dif@PERMANENT

r.resamp.rst input=elevation@PERMANENT ew_res=3 ns_res=3 elev=elevation_rst
overlap=3 zmult=1.0 tension=40.
g.region rast=elevation_rst
time r.watershed elevation=elevation_rst@PERMANENT
accumulation=nc_rst_accum_old drainage=nc_rst_dir_old basin=nc_rst_sheds_old
stream=nc_rst_streams_old thresh=500
time r.watershed.fast elevation=elevation_rst@PERMANENT
accumulation=nc_rst_accum_fast drainage=nc_rst_dir_fast
basin=nc_rst_sheds_fast stream=nc_rst_streams_fast thresh=500
r.mapcalc nc_rst_dir_dif='if(("nc_rst_dir_old" - "nc_rst_dir_fast" !=
0),1,0)'
r.mapcalc nc_rst_accum_dif='if(("nc_rst_accum_old" - "nc_rst_accum_fast" !=
0),1,0)'
r.stats -c input=nc_rst_dir_dif@PERMANENT
r.stats -c input=nc_rst_accum_dif@PERMANENT

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user