[GRASS-dev] Re: [GRASS GIS] #518: negative flow accumulation with r.watershed SFD or MFD

On Mar 6, 2009, at 3:09 AM, <grass-dev-request@lists.osgeo.org> wrote:

Date: Fri, 06 Mar 2009 08:18:38 -0000
From: "GRASS GIS" <trac@osgeo.org>
Subject: [GRASS-dev] Re: [GRASS GIS] #518: negative flow accumulation
  with r.watershed SFD or MFD
To: undisclosed-recipients:;
Message-ID: <048.30fedfc30b8b5f67edb5ee82855f1ca8@osgeo.org>
Content-Type: text/plain; charset="utf-8"

#518: negative flow accumulation with r.watershed SFD or MFD
--------------------------+-------------------------------------------------
Reporter: dylan | Owner: grass-dev@lists.osgeo.org
     Type: enhancement | Status: new
Priority: major | Milestone:
Component: default | Version: svn-develbranch6
Resolution: | Keywords: r.watershed
Platform: Linux | Cpu: x86-32
--------------------------+-------------------------------------------------
Comment (by mmetz):

IMHO, there is still some cleaning up to do for r.watershed. I left some
things in it for backwards compatibility. One such thing is the "visual"
output which I regard as obsolete because "accumulation" output now comes
with a (better I hope) colortable by default.
The "visual" output could be removed and another output option be added,
e.g. called "absacc" that gives absolute accumulation values. That would
however break backwards compatibility, a new flag would not.

There is a good reason *not* to add this option/flag, nicely illustrated
by Dylan creating this ticket. The purpose of negative accumulation values
is to make people wonder what on earth is going here, then figure out that
not the whole catchment area under study was included and expand the
computational region accordingly to get proper results: only positive
accumulation values for the catchment under study.

IMHO, this is a very poor way to achieve this end--i.e., silently performing all hydrology operations and producing completely bogus values to get you to scratch your head and wonder what is going on when you notice it. I suspect this is a legacy of the age of this module. To make things worse, GRASS's other hydrology modules do not work this way.

A much more direct way is to give a warning for each problematic basin in the output:
WARNING: part of basin XX extends beyond region extent; accumulation values may be too low.

It should be possible to turn off this "feature" of negative accumulation. This is especially important for scripting. For example, we use accumulation values as part of a complex, iterative erosion/deposition model. We are currently running this on complete watersheds created with r.watershed (although an earlier version). Our goal is not to create accumulation maps and in fact never see the accumulation maps but use the values for additional modeling. Watershed maps get deleted along the way unless a flag is set to keep them because of the very large numbers of maps created to model decades or centuries of surface process dynamics. The new version seems to be calculating the watershed in a slightly different way, and now a tiny bit must extend off the region because we are getting negative values in some watersheds that were not problematic before. We will check this of course. But we never knew that we were getting negative values until this issue came up in another context. So our model values are completely bogus and we have to run this over again. The slight difference in accumulation would have only a very tiny effect on our results, but entire negative accumulation values make a big difference. So we'll have to build in taking the absolute value of accumulation (completely negating the goal of the negative values, BTW), but it would still be nice to have a text informational warning about which watersheds might be a problem for users of the model.

Michael

Michael Barton wrote:

Comment (by mmetz):

There is a good reason *not* to add this option/flag, nicely illustrated
by Dylan creating this ticket. The purpose of negative accumulation values
is to make people wonder what on earth is going here, then figure out that
not the whole catchment area under study was included and expand the
computational region accordingly to get proper results: only positive
accumulation values for the catchment under study.

IMHO, this is a very poor way to achieve this end--i.e., silently performing all hydrology operations and producing completely bogus values to get you to scratch your head and wonder what is going on when you notice it. I suspect this is a legacy of the age of this module. To make things worse, GRASS's other hydrology modules do not work this way.

AFAIK, other hydrology modules also calculate what you termed "completely bogus" values, but there is no way to find out what values are bogus if all flow accumulation values are positive. The most conservative solution for cells for which flow accumulation could not be exactly determined would be to set flow accumulation to NULL and not give a minimum estimate.

A much more direct way is to give a warning for each problematic basin in the output:
WARNING: part of basin XX extends beyond region extent; accumulation values may be too low.

IMHO not very practical. When thousands of basins are calculated, you would get flooded with these warnings.

It should be possible to turn off this "feature" of negative accumulation. This is especially important for scripting. For example, we use accumulation values as part of a complex, iterative erosion/deposition model.

Hmm, don't you need to know the exact flow accumulation to calculate erosion/deposition? Are some "at least so much, but probably much more" values really ok? BTW, RUSLE factors as created by r.watershed are also only correct for cells with positive flow accumulation.

We are currently running this on complete watersheds created with r.watershed (although an earlier version). Our goal is not to create accumulation maps and in fact never see the accumulation maps but use the values for additional modeling. Watershed maps get deleted along the way unless a flag is set to keep them because of the very large numbers of maps created to model decades or centuries of surface process dynamics. The new version seems to be calculating the watershed in a slightly different way, and now a tiny bit must extend off the region because we are getting negative values in some watersheds that were not problematic before.

The new SFD version should produce results identical to previous versions. If not, I introduced a bug. The MFD version calculates different, improved basins compared to SFD.

We will check this of course. But we never knew that we were getting negative values until this issue came up in another context. So our model values are completely bogus and we have to run this over again. The slight difference in accumulation would have only a very tiny effect on our results, but entire negative accumulation values make a big difference. So we'll have to build in taking the absolute value of accumulation (completely negating the goal of the negative values, BTW),

Absolute values are still bogus in the sense that they are lower than the real values, sometimes by orders of magnitude.

but it would still be nice to have a text informational warning about which watersheds might be a problem for users of the model.

It could be possible to print out two tables like
basin_no|complete

and

half_basin_no|complete

with complete = 0 meaning that there is a problem and complete = 1 meaning that there is no problem.

Or introduce a new output option named something like "incomplete_basin_parts" where the incomplete parts are assigned the negative value of the basin they belong to.

Markus M

Thanks for the ideas and response Markus. See below.
On Mar 6, 2009, at 8:48 AM, Markus Metz wrote:

Michael Barton wrote:

Comment (by mmetz):

There is a good reason *not* to add this option/flag, nicely illustrated
by Dylan creating this ticket. The purpose of negative accumulation values
is to make people wonder what on earth is going here, then figure out that
not the whole catchment area under study was included and expand the
computational region accordingly to get proper results: only positive
accumulation values for the catchment under study.

IMHO, this is a very poor way to achieve this end--i.e., silently performing all hydrology operations and producing completely bogus values to get you to scratch your head and wonder what is going on when you notice it. I suspect this is a legacy of the age of this module. To make things worse, GRASS's other hydrology modules do not work this way.

AFAIK, other hydrology modules also calculate what you termed "completely bogus" values, but there is no way to find out what values are bogus if all flow accumulation values are positive. The most conservative solution for cells for which flow accumulation could not be exactly determined would be to set flow accumulation to NULL and not give a minimum estimate.

A much more direct way is to give a warning for each problematic basin in the output:
WARNING: part of basin XX extends beyond region extent; accumulation values may be too low.

IMHO not very practical. When thousands of basins are calculated, you would get flooded with these warnings.

Are people calculating so many basins that thousands would be along the region extents? Maybe my own work is much different from others, but 10's to a few 100 seems more likely. 200 warnings could still be a lot, if all basins ran off the map. But this should only happen with a limited number of basins. That is, for many small basins, only the ones along the region edges will be affected. For few large basins, there are many fewer basins to be affected.

It should be possible to turn off this "feature" of negative accumulation. This is especially important for scripting. For example, we use accumulation values as part of a complex, iterative erosion/deposition model.

Hmm, don't you need to know the exact flow accumulation to calculate erosion/deposition? Are some "at least so much, but probably much more" values really ok? BTW, RUSLE factors as created by r.watershed are also only correct for cells with positive flow accumulation.

It depends on how much difference there is. A couple cells would not make much difference, 100's would make a difference. But I'd still like to know which ones are a problem of course. I'm not suggesting to make absolute value of accumulation a default (not permitted in 6.5 anyway), but simply making it an option that the user could exercise

We are currently running this on complete watersheds created with r.watershed (although an earlier version). Our goal is not to create accumulation maps and in fact never see the accumulation maps but use the values for additional modeling. Watershed maps get deleted along the way unless a flag is set to keep them because of the very large numbers of maps created to model decades or centuries of surface process dynamics. The new version seems to be calculating the watershed in a slightly different way, and now a tiny bit must extend off the region because we are getting negative values in some watersheds that were not problematic before.

The new SFD version should produce results identical to previous versions. If not, I introduced a bug. The MFD version calculates different, improved basins compared to SFD.

This is the issue. It happened when we tried MFD. I'm happy for the better calculations and we'll redo our watershed boundaries. But this points out some of the problems.

We will check this of course. But we never knew that we were getting negative values until this issue came up in another context. So our model values are completely bogus and we have to run this over again. The slight difference in accumulation would have only a very tiny effect on our results, but entire negative accumulation values make a big difference. So we'll have to build in taking the absolute value of accumulation (completely negating the goal of the negative values, BTW),

Absolute values are still bogus in the sense that they are lower than the real values, sometimes by orders of magnitude.

but it would still be nice to have a text informational warning about which watersheds might be a problem for users of the model.

It could be possible to print out two tables like
basin_no|complete

and

half_basin_no|complete

with complete = 0 meaning that there is a problem and complete = 1 meaning that there is no problem.

Or introduce a new output option named something like "incomplete_basin_parts" where the incomplete parts are assigned the negative value of the basin they belong to.

If this is better than warnings, that's OK too. I just think that there might be a better way to do this than only by making the accumulation values negative.

Michael

Michael Barton wrote:

A much more direct way is to give a warning for each problematic basin in the output:
WARNING: part of basin XX extends beyond region extent; accumulation values may be too low.

IMHO not very practical. When thousands of basins are calculated, you would get flooded with these warnings.

Are people calculating so many basins that thousands would be along the region extents?

It is technically possible to calculate thousands of basins, therefore the code must consider that. IMHO, the code must consider all technically possible scenarios, you never know what a module is used for, and I would like it to be very generally applicable and not restricted to certain scenarios.

Hmm, don't you need to know the exact flow accumulation to calculate erosion/deposition? Are some "at least so much, but probably much more" values really ok? BTW, RUSLE factors as created by r.watershed are also only correct for cells with positive flow accumulation.

It depends on how much difference there is. A couple cells would not make much difference, 100's would make a difference. But I'd still like to know which ones are a problem of course. I'm not suggesting to make absolute value of accumulation a default (not permitted in 6.5 anyway), but simply making it an option that the user could exercise

Now I'm confused: negative values tell you where there is a problem, but you don't want negative values, only positive values, but then there should be additional, new output telling you where the problems are?

Coming up with my technically possible scenarios: let's assume a basin threshold of 10,000, that gives a max accumulation value for exterior basins of 10,000. 10 cells within an exterior basin have negative flow accumulation, that's only 0.1%. The absolute flow accumulation value is probably very low, whereas the real flow accumulation value can be anything, also >>> 10,000. Certain calculations will be very different. The exterior basin is in this case in reality an interior basin, but this can only be found out by expanding the computational region.

The new SFD version should produce results identical to previous versions. If not, I introduced a bug. The MFD version calculates different, improved basins compared to SFD.

This is the issue. It happened when we tried MFD. I'm happy for the better calculations and we'll redo our watershed boundaries. But this points out some of the problems.

To be precise, the calculations are only better if MFD is regarded as more accurate than SFD.

Or introduce a new output option named something like "incomplete_basin_parts" where the incomplete parts are assigned the negative value of the basin they belong to.

If this is better than warnings, that's OK too. I just think that there might be a better way to do this than only by making the accumulation values negative.

Maybe Helena can give some tips?

I personally am biased, I like the concept of the original r.watershed with all that information in the output too much:-) See also negative drainage direction, IMHO very useful.

Markus M

There is no question that the default should be kept negative, although checking whether the result
is correct would not hurt - we can look at it with our Panama experiments, others using
r.watershed could provide some helpful feedback too.

But adding a flag to keep values positive actually makes sense to me, if the flag is properly
described (e.g. use positive flowaccumulation even for uncomplete contributing areas).
User who selects to run r.watershed with this flag apparently knows that he has
uncomplete watersheds and will be getting negative values that may not be useful
for his application, so there is no need to tell him that he has a problem - he would already know it
and for some reason wants to ignore it. I often found myself running mapcalc abs on the accum
output for various reasons.

BUT for erosion modeling you really want to run r.watershed with negative values - erosion models require
upslope contributing area as measure of water flow and if the watershed is not complete,
water flow will be underestimated leading to underestimated erosion rates. So the cells with
uncomplete contributing area need to be excluded from the computation of erosion, and here
the negative values actually come handy.

Helena

On Mar 6, 2009, at 12:35 PM, Markus Metz wrote:

Michael Barton wrote:

A much more direct way is to give a warning for each problematic basin in the output:
WARNING: part of basin XX extends beyond region extent; accumulation values may be too low.

IMHO not very practical. When thousands of basins are calculated, you would get flooded with these warnings.

Are people calculating so many basins that thousands would be along the region extents?

It is technically possible to calculate thousands of basins, therefore the code must consider that. IMHO, the code must consider all technically possible scenarios, you never know what a module is used for, and I would like it to be very generally applicable and not restricted to certain scenarios.

Hmm, don't you need to know the exact flow accumulation to calculate erosion/deposition? Are some "at least so much, but probably much more" values really ok? BTW, RUSLE factors as created by r.watershed are also only correct for cells with positive flow accumulation.

It depends on how much difference there is. A couple cells would not make much difference, 100's would make a difference. But I'd still like to know which ones are a problem of course. I'm not suggesting to make absolute value of accumulation a default (not permitted in 6.5 anyway), but simply making it an option that the user could exercise

Now I'm confused: negative values tell you where there is a problem, but you don't want negative values, only positive values, but then there should be additional, new output telling you where the problems are?

Coming up with my technically possible scenarios: let's assume a basin threshold of 10,000, that gives a max accumulation value for exterior basins of 10,000. 10 cells within an exterior basin have negative flow accumulation, that's only 0.1%. The absolute flow accumulation value is probably very low, whereas the real flow accumulation value can be anything, also >>> 10,000. Certain calculations will be very different. The exterior basin is in this case in reality an interior basin, but this can only be found out by expanding the computational region.

The new SFD version should produce results identical to previous versions. If not, I introduced a bug. The MFD version calculates different, improved basins compared to SFD.

This is the issue. It happened when we tried MFD. I'm happy for the better calculations and we'll redo our watershed boundaries. But this points out some of the problems.

To be precise, the calculations are only better if MFD is regarded as more accurate than SFD.

Or introduce a new output option named something like "incomplete_basin_parts" where the incomplete parts are assigned the negative value of the basin they belong to.

If this is better than warnings, that's OK too. I just think that there might be a better way to do this than only by making the accumulation values negative.

Maybe Helena can give some tips?

I personally am biased, I like the concept of the original r.watershed with all that information in the output too much:-) See also negative drainage direction, IMHO very useful.

Markus M
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

On Mar 7, 2009, at 8:50 PM, Helena Mitasova wrote:

There is no question that the default should be kept negative, although checking whether the result
is correct would not hurt - we can look at it with our Panama experiments, others using
r.watershed could provide some helpful feedback too.

But adding a flag to keep values positive actually makes sense to me, if the flag is properly
described (e.g. use positive flowaccumulation even for uncomplete contributing areas).
User who selects to run r.watershed with this flag apparently knows that he has
uncomplete watersheds and will be getting negative values that may not be useful
for his application, so there is no need to tell him that he has a problem - he would already know it
and for some reason wants to ignore it. I often found myself running mapcalc abs on the accum
output for various reasons.

BUT for erosion modeling you really want to run r.watershed with negative values - erosion models require
upslope contributing area as measure of water flow and if the watershed is not complete,
water flow will be underestimated leading to underestimated erosion rates. So the cells with
uncomplete contributing area need to be excluded from the computation of erosion, and here
the negative values actually come handy.

Helena

I agree. This is what I'm suggesting. We cannot change the default behavior for GRASS 6 and may not want to for GRASS 7. But it would be good to have a way to turn this off in some circumstances.

Michael

On Mar 6, 2009, at 12:35 PM, Markus Metz wrote:

Michael Barton wrote:

A much more direct way is to give a warning for each problematic basin in the output:
WARNING: part of basin XX extends beyond region extent; accumulation values may be too low.

IMHO not very practical. When thousands of basins are calculated, you would get flooded with these warnings.

Are people calculating so many basins that thousands would be along the region extents?

It is technically possible to calculate thousands of basins, therefore the code must consider that. IMHO, the code must consider all technically possible scenarios, you never know what a module is used for, and I would like it to be very generally applicable and not restricted to certain scenarios.

Hmm, don't you need to know the exact flow accumulation to calculate erosion/deposition? Are some "at least so much, but probably much more" values really ok? BTW, RUSLE factors as created by r.watershed are also only correct for cells with positive flow accumulation.

It depends on how much difference there is. A couple cells would not make much difference, 100's would make a difference. But I'd still like to know which ones are a problem of course. I'm not suggesting to make absolute value of accumulation a default (not permitted in 6.5 anyway), but simply making it an option that the user could exercise

Now I'm confused: negative values tell you where there is a problem, but you don't want negative values, only positive values, but then there should be additional, new output telling you where the problems are?

Coming up with my technically possible scenarios: let's assume a basin threshold of 10,000, that gives a max accumulation value for exterior basins of 10,000. 10 cells within an exterior basin have negative flow accumulation, that's only 0.1%. The absolute flow accumulation value is probably very low, whereas the real flow accumulation value can be anything, also >>> 10,000. Certain calculations will be very different. The exterior basin is in this case in reality an interior basin, but this can only be found out by expanding the computational region.

The new SFD version should produce results identical to previous versions. If not, I introduced a bug. The MFD version calculates different, improved basins compared to SFD.

This is the issue. It happened when we tried MFD. I'm happy for the better calculations and we'll redo our watershed boundaries. But this points out some of the problems.

To be precise, the calculations are only better if MFD is regarded as more accurate than SFD.

Or introduce a new output option named something like "incomplete_basin_parts" where the incomplete parts are assigned the negative value of the basin they belong to.

If this is better than warnings, that's OK too. I just think that there might be a better way to do this than only by making the accumulation values negative.

Maybe Helena can give some tips?

I personally am biased, I like the concept of the original r.watershed with all that information in the output too much:-) See also negative drainage direction, IMHO very useful.

Markus M
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
grass-dev Info Page

Michael Barton wrote:

On Mar 7, 2009, at 8:50 PM, Helena Mitasova wrote:

There is no question that the default should be kept negative, although checking whether the result
is correct would not hurt - we can look at it with our Panama experiments, others using
r.watershed could provide some helpful feedback too.

But adding a flag to keep values positive actually makes sense to me, if the flag is properly
described (e.g. use positive flowaccumulation even for uncomplete contributing areas).

Will do, but only with a warning that incomplete contributing areas can not be identified with positive flow accumulation only.

I will also add an example to the manual on how to identify what parts of what basins are incomplete:
r.mapcalc "problems = if(flow_acc < 0, basin, null())"

BUT for erosion modeling you really want to run r.watershed with negative values - erosion models require
upslope contributing area as measure of water flow and if the watershed is not complete,
water flow will be underestimated leading to underestimated erosion rates. So the cells with
uncomplete contributing area need to be excluded from the computation of erosion, and here
the negative values actually come handy.

"accumulation
Output map: The absolute value of each cell in this output map layer is the amount of overland flow that traverses the cell. This value will be the number of upland cells plus one if no overland flow map is given. If the overland flow map is given, the value will be in overland flow units. Negative numbers indicate that those cells possibly have surface runoff from outside of the current geographic region. Thus, any cells with negative values cannot have their surface runoff and sedimentation yields calculated accurately."

This is the r.watershed manual entry from grass54 through to grass7 (don't have access to pre-54). This description can be updated if it is not clear enough.

Markus M

Thanks!

Michael

On Mar 8, 2009, at 12:08 AM, Markus Metz wrote:

Michael Barton wrote:

On Mar 7, 2009, at 8:50 PM, Helena Mitasova wrote:

There is no question that the default should be kept negative, although checking whether the result
is correct would not hurt - we can look at it with our Panama experiments, others using
r.watershed could provide some helpful feedback too.

But adding a flag to keep values positive actually makes sense to me, if the flag is properly
described (e.g. use positive flowaccumulation even for uncomplete contributing areas).

Will do, but only with a warning that incomplete contributing areas can not be identified with positive flow accumulation only.

I will also add an example to the manual on how to identify what parts of what basins are incomplete:
r.mapcalc "problems = if(flow_acc < 0, basin, null())"

BUT for erosion modeling you really want to run r.watershed with negative values - erosion models require
upslope contributing area as measure of water flow and if the watershed is not complete,
water flow will be underestimated leading to underestimated erosion rates. So the cells with
uncomplete contributing area need to be excluded from the computation of erosion, and here
the negative values actually come handy.

"accumulation
Output map: The absolute value of each cell in this output map layer is the amount of overland flow that traverses the cell. This value will be the number of upland cells plus one if no overland flow map is given. If the overland flow map is given, the value will be in overland flow units. Negative numbers indicate that those cells possibly have surface runoff from outside of the current geographic region. Thus, any cells with negative values cannot have their surface runoff and sedimentation yields calculated accurately."

This is the r.watershed manual entry from grass54 through to grass7 (don't have access to pre-54). This description can be updated if it is not clear enough.

Markus M