[GRASS-dev] v.prune with Douglas-Peucker

Hello all,

I encountered a few problems while pruning vector boundaries and decided to try to update the algorithm. The biggest problem with the old algorithm was that it didn't work in latlon. From previous mails to to list I got the idea to use the Douglas-Peucker algorithm as implemented in gshhs_dp for pruning lines and boundaries. The algorithm in gshhs was tailored to latlon, but I wanted it to work in any projection and use map units. This is now realized, it works both in latlon and in projections with meters or anything else as map units. The attached prune.c goes to lib/vector/diglib/prune.c The threshold is the deviation of a given point from a straight line connecting the two adjacent points and should be not more than half the map resolution. Using insanely large thresholds can result in duplicate centroids. In my test map with 146000 vertices, 3109 centroids and 4294 areas (areas without centroids are islands, this is wanted), I got 2 duplicate centroids when the threshold was 1000x map resolution, an insane threshold removing over 80% of the vertices. The algorithm itself does not change the first and last point, so there is no more need to pass a boundary without the first and last point to the algorithm. That means that vector/v.clean/prune.c can be modified accordingly.

As before, there is a first routine removing duplicate points. Is there a reason for duplicate points in lines or boundaries? If not, why are they there at all? I converted rasters to vectors extracting areas and typically had above 30% duplicate vertices. Working with large vectors with millions of vertices, they are clogging up the vector. So something for the wish list would be to have line and area vectors without duplicate vertices after map conversions. For the time being, prune should be first run with a threshold of 0.0 as before to remove duplicate centroids. Then lines and boundaries can be pruned beginning with a small threshold and increasing it gently (e.g. next threshold = 2x previous threshold), until the vector is satisfyingly reduced while preserving the desired amount of detail.

So far I tested the new Douglas-Peucker algorithm in different projections and it works fine. But I only tested it on my system (FC6 x86_64, gcc-4.1.1) with grass-6.2.1 and grass-6.3 cvs snapshot from 12.5.2007. If anybody is interested in testing the new algorithm and help make it the new standard for GRASS, please do!

CAUTION: I'm not a programmer, I'm a biologist by training doing GIS and remote sensing work, so I can't guarantee for the code. I tried my best.

Markus (not Neteler but Metz)

(attachments)

prune.c (8.26 KB)

Markus Metz wrote:

I encountered a few problems while pruning vector boundaries and decided
to try to update the algorithm. The biggest problem with the old
algorithm was that it didn't work in latlon. From previous mails to to
list I got the idea to use the Douglas-Peucker algorithm as implemented
in gshhs_dp for pruning lines and boundaries. The algorithm in gshhs was
tailored to latlon, but I wanted it to work in any projection and use
map units. This is now realized, it works both in latlon and in
projections with meters or anything else as map units. The attached
prune.c goes to lib/vector/diglib/prune.c The threshold is the deviation
of a given point from a straight line connecting the two adjacent points
and should be not more than half the map resolution. Using insanely
large thresholds can result in duplicate centroids. In my test map with
146000 vertices, 3109 centroids and 4294 areas (areas without centroids
are islands, this is wanted), I got 2 duplicate centroids when the
threshold was 1000x map resolution, an insane threshold removing over
80% of the vertices. The algorithm itself does not change the first and
last point, so there is no more need to pass a boundary without the
first and last point to the algorithm. That means that
vector/v.clean/prune.c can be modified accordingly.

As before, there is a first routine removing duplicate points. Is there
a reason for duplicate points in lines or boundaries? If not, why are
they there at all? I converted rasters to vectors extracting areas and
typically had above 30% duplicate vertices.

I used to experience doubled vertices in v.build.polylines and v.clean
tool=snap output (these bugs are fixed now in both 6.2.2 CVS and 6.3
CVS) but never in r.to.vect output.

Trying to reproduce your problem I run, in spearfish6 location:

$ g.region rast=soils -a
$ r.to.vect input=soils output=soils_v feature=area

$ g.region rast=streams -a
$ r.thin input=streams output=streams_th
$ r.to.vect input=streams_th output=streams_th_v feature=line

but neither yielded doubled vertices. Using GRASS 6.2.2 and 6.3 current
CVS. Could you provide an exact procedure to reproduce your problem?

Maciek

Maciej Sieczka wrote:

Markus Metz wrote:

I encountered a few problems while pruning vector boundaries and decided
to try to update the algorithm. The biggest problem with the old
algorithm was that it didn't work in latlon. From previous mails to to
list I got the idea to use the Douglas-Peucker algorithm as implemented
in gshhs_dp for pruning lines and boundaries. The algorithm in gshhs was
tailored to latlon, but I wanted it to work in any projection and use
map units. This is now realized, it works both in latlon and in
projections with meters or anything else as map units. The attached
prune.c goes to lib/vector/diglib/prune.c The threshold is the deviation
of a given point from a straight line connecting the two adjacent points
and should be not more than half the map resolution. Using insanely
large thresholds can result in duplicate centroids. In my test map with
146000 vertices, 3109 centroids and 4294 areas (areas without centroids
are islands, this is wanted), I got 2 duplicate centroids when the
threshold was 1000x map resolution, an insane threshold removing over
80% of the vertices. The algorithm itself does not change the first and
last point, so there is no more need to pass a boundary without the
first and last point to the algorithm. That means that
vector/v.clean/prune.c can be modified accordingly.

As before, there is a first routine removing duplicate points. Is there
a reason for duplicate points in lines or boundaries? If not, why are
they there at all? I converted rasters to vectors extracting areas and
typically had above 30% duplicate vertices.
    
I used to experience doubled vertices in v.build.polylines and v.clean
tool=snap output (these bugs are fixed now in both 6.2.2 CVS and 6.3
CVS) but never in r.to.vect output.

Trying to reproduce your problem I run, in spearfish6 location:

$ g.region rast=soils -a
$ r.to.vect input=soils output=soils_v feature=area

use
$ r.to.vect -s input=soils output=soils_vs feature=area
soils_vs has 30337 duplicate vertices
without smoothing corners I also don't get duplicate vertices

$ g.region rast=streams -a
$ r.thin input=streams output=streams_th
$ r.to.vect input=streams_th output=streams_th_v feature=line

this vector doesn't have any duplicate vertices, neither has streams_th_vs created with
$ r.to.vect -s input=streams_th output=streams_th_vs feature=line

but neither yielded doubled vertices. Using GRASS 6.2.2 and 6.3 current
CVS. Could you provide an exact procedure to reproduce your problem?

Maciek

My main issue was that I wanted to offer a new mehtod for line/boundary simplification based on the Douglas-Peucker algorithm. These duplicate vertices are not much of a problem, the only increase the size of a vector which can become a problem when working on large regions with a high resolution. The real problem is that the old pruning method produces funny results and doesn't work in latlon.

Run
$ v.in.region output=region type=area

create new location for spearfish in latlon
import the vector region from spearfish60
set region to match this vector
import soils_vs from spearfish60

prune vector with
$ v.clean input=soils_vs@user1 output=soils_vs_double type=boundary tool=prune thresh=0

this removes duplicate vertices (old and new prune.c)

now prune
$ v.clean input=soils_vs_double@user1 output=soils_vs_old type=boundary tool=prune thresh=0.001

results is nothing pruned. Try higher thresholds, I tried thresh=10, still nothing is pruned. Now try the new Douglas-Peucker algorithm with thresh=0.001 (this is decimal degrees according to the projection) and have a look at the result. I hope you like it.

Sorry for the long answer!

Markus

Markus Metz wrote:

Maciej Sieczka wrote:

Trying to reproduce your problem I run, in spearfish6 location:

$ g.region rast=soils -a
$ r.to.vect input=soils output=soils_v feature=area

use
$ r.to.vect -s input=soils output=soils_vs feature=area
soils_vs has 30337 duplicate vertices

Please report this as a bug.

without smoothing corners I also don't get duplicate vertices

I see.

$ g.region rast=streams -a
$ r.thin input=streams output=streams_th
$ r.to.vect input=streams_th output=streams_th_v feature=line

this vector doesn't have any duplicate vertices, neither has
streams_th_vs created with
$ r.to.vect -s input=streams_th output=streams_th_vs feature=line

but neither yielded doubled vertices. Using GRASS 6.2.2 and 6.3 current
CVS. Could you provide an exact procedure to reproduce your problem?

Maciek

My main issue was that I wanted to offer a new mehtod for line/boundary
simplification based on the Douglas-Peucker algorithm. These duplicate
vertices are not much of a problem, the only increase the size of a
vector which can become a problem when working on large regions with a
high resolution.

This *is* a problem I believe. GRASS vector data model is quite memory
consuming already. Any vector module using more memory than needed is a
double problem for this reason actually. Moreover I don't think
r.to.vect -s is supposed to output doubled vertices.

The real problem is that the old pruning method
produces funny results and doesn't work in latlon.

I've run into this too. Have you browsed the grass-dev archives?

Sorry for the long answer!

No problem. Maybe your effort may contribute to the recently started
GRASS Google SOC project dedicated to vector generalisation tools for
GRASS (see recent archives)?

Maciek

Maciej Sieczka wrote:

Markus Metz wrote:
...
  

The real problem is that the old pruning method
produces funny results and doesn't work in latlon.
    
I've run into this too. Have you browsed the grass-dev archives?

Yes, there the problem has been reported before and it was suggested to use the Douglas-Peucker algorithm. I needed a line simplification tool now, so I implemented this algorithm.

Sorry for the long answer!
    
No problem. Maybe your effort may contribute to the recently started
GRASS Google SOC project dedicated to vector generalisation tools for
GRASS (see recent archives)?
  

I've seen that post.Daniel Bundala wants to write complete GRASS Modules for line generalization and smoothing. Final evaluation deadline is August 31, and I needed a routine now. I would like to contribute to vector generalisation tools for GRASS, but I'm not a programmer, and Daniel might very well come up with better code. Nevertheless, if anybody wants a line/boundary simplification tool, it is available through my first post.

Markus