On Jun 16, 2009, at 3:41 AM, Markus GRASS wrote:
Dwight Needels wrote:
When the rmdangle tool runs into a pair of dangles at the end of a
line where each is shorter than threshold (a "Y"), it removes one but
leaves the other. This makes sense, because after the first one is
removed the second one is no longer a dangle (it is now the terminal
line segment).
"Line segment" is confusing because it refers to a part of a line, e.g.
a part of line C or A. Line E may have only one segment, line C is
composed of several segments.
Agreed; but I keep using the term unconsciously to distinguish that portion of a line that falls between a pair of nodes. The term "line" is ambiguous because it is also used to refer to an entire polyline. For example,the Vector Introduction says "Note that all lines and boundaries can be polylines (with nodes in between)" rather than something like:
"line: a directed sequence of connected vertices with exactly two endpoints called nodes", and
"polyline: a non-branching series of connected lines or boundaries with a shared node at each connection"
For this discussion I will try to restrict myself to the above definitions, so that the term "line" will never refer to a polyline. This seems to be consistent with your usage. Would it be useful to make this distinction explicit on the Vector Introduction page?
The question is, how does it decide which one to remove ?
First come fist serve. The rmdangle tool doesn't really decide, it goes
through all lines and if a line is a dangle and shorter than threshold,
it gets removed. The tool does not look at a pair of dangles at once as
far as I can tell.
Looks like A and D were removed first, then B qualified as a dangle and
was also removed, but why line E stayed in place is strange, it should
also be removed with that threshold.
Ah... this is becoming clearer. Since rmdangle only deals with one line at a time, it never knows that there is a choice.
I think I may have finally figured out how a dangle is defined by v.clean tool=rmdangle, working backwards from the behavior. As you suggested, lines are processed in some order, probably by internal Id. The lines A - E in my previous example have Id and cat values of 1 - 5, respectively. I deleted line A and replaced it with a similar length line that now has the highest Id (6) instead of the lowest. Removing dangles left the replacement line A in place while deleting the shorter dangles (consistent with either an Id or a cat processing order).
As far as I can tell, a dangle must have at least one free (unshared) node, so that this tool will never delete a line in the middle of a polyline. The rmdangle tool marches through the Id values until it finds a line with a free node (a terminal line). Starting at the terminus, it adds up all line lengths until it comes to either the other terminus or a branch point. If the total length is less than the threshold, the set of lines is a dangle and the entire polyline up to the branch point is removed. Processing continues through all Id values.
In my test case:
Line A was removed first because the sum of the (1) line length from terminus to branch point ABD is less than threshold.
Line B was not removed because it does not have a free node (despite having a length below the threshold).
Line C was not removed because it is longer than the threshold (despite having a free node).
Lines D and B are removed together, because the sum of the line lengths from terminus to branch point BCE is less than threshold.
Line E is not removed, because the sum of the line lengths from terminus to the closest branch point/terminus is greater than the threshold (despite the length of Line E being below the threshold).
This also explains why v.clean tool=rmdangle threshold=-1 removes most, if not all, non-closed lines.
The current definition on the v.clean manual page is...
"rmdangle: removes dangles, threshold ignored if <0", without defining what a dangle is.
I propose that this be changed to something like...
"rmdangle: removes terminal lines or polylines with a length (to the nearest branch point/terminus) less than threshold, threshold ignored if <0"
It may be worth having a note called "What is a dangle?", but regardless it would be good to have a statement that says something like...
"The rmdangle tool processes dangles sequentially by internal Id, which may result in short lines with high Id values remaining after lines with lower Id values have been deleted from the nearest branch point."
Does any of this look incorrect? Can the processing by internal Id be confirmed?
Markus, thanks for all of your help on this. -Dwight