[GRASS-user] Correcting data errors

Hello again,

This issue "arised" from my other thread ('v.generalize: does it take
forever?'). Since I think it would be bad practice to mix issues on
the same thread, I'm opening this one.

I have a dataset that corresponds to land usage on the 'legal amazon
region', the region that are legally recognized as related to the
amazon forest, even if there's not much there at the moment. This
dataset contain information of this region for 3 years already (2008,
2010, 2012) and 2014 is underway.

However, in the process of developing a web interface for this
dataset, I had the need to generalize that information, to reduce the
volume of data considering the current viewport. AFAIK, the only
proper way of doing that is topologically, so you don't end up with
gaps and overlaps.

PostGIS can do something like this, on the 2+ version, but my problem
is that the data is full of topological errors. I can't really call
them 'data errors', since the toolchain considered when the dataset
was created didn't have these restrictions. I'm going further than it
was expected when the process started. But they are topological errors
that I need to deal with.

As far as I saw, there are gaps between polygons, self-intersections,
bridges, etc etc. I'm fairly certain that there is at least one
occurrence of each type of error known to man. Matter of fact, a lot
of the polygons are not ST_Valid either.

I've read the manual and fussed with v.clean, trying with bpol, break,
rmdupl, etc. But I still don't really have the feeling that I know
what I'm doing. I lack experience with this, so:

TL;DR:
What would be the 'recommended' way of dealing with the 'errors' made
when the data is created with zero topological restrictions (and saved
as .shp?)

Thanks again,

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/

On Sat, Jan 10, 2015 at 7:34 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

Hello again,

This issue "arised" from my other thread ('v.generalize: does it take
forever?'). Since I think it would be bad practice to mix issues on
the same thread, I'm opening this one.

I have a dataset that corresponds to land usage on the 'legal amazon
region', the region that are legally recognized as related to the
amazon forest, even if there's not much there at the moment. This
dataset contain information of this region for 3 years already (2008,
2010, 2012) and 2014 is underway.

However, in the process of developing a web interface for this
dataset, I had the need to generalize that information, to reduce the
volume of data considering the current viewport. AFAIK, the only
proper way of doing that is topologically, so you don't end up with
gaps and overlaps.

PostGIS can do something like this, on the 2+ version, but my problem
is that the data is full of topological errors. I can't really call
them 'data errors', since the toolchain considered when the dataset
was created didn't have these restrictions. I'm going further than it
was expected when the process started. But they are topological errors
that I need to deal with.

As far as I saw, there are gaps between polygons, self-intersections,
bridges, etc etc. I'm fairly certain that there is at least one
occurrence of each type of error known to man. Matter of fact, a lot
of the polygons are not ST_Valid either.

I've read the manual and fussed with v.clean, trying with bpol, break,
rmdupl, etc. But I still don't really have the feeling that I know
what I'm doing. I lack experience with this, so:

TL;DR:
What would be the 'recommended' way of dealing with the 'errors' made
when the data is created with zero topological restrictions (and saved
as .shp?)

For import, try to find a snapping threshold for v.in.ogr that
produces an error-free output. Ideally the output would not only be
error-free, but the number of centroids would match the number of
input polygons (both are reported by v.in.ogr). The min_area option of
v.in.ogr could also help. The bulk of the cleaning should be done by
v.in.ogr. After that, removing small areas with v.clean tool=rmarea,
threshold in square meters, could help. For Terraclass (and PRODES)
which are mainly based on Landsat data, a threshold of 10 square
meters could remove artefacts and preserve valid areas (Landsat pixel
size is about 90 square meters). The threshold needs to be empirically
determined.

I am not aware of a standard procedure that works for all data sources.

Markus M

Thanks again,

F
-=--=-=-
Fábio Augusto Salve Dias
http://sites.google.com/site/fabiodias/
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

On Tue, Jan 13, 2015 at 8:41 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

On Sat, Jan 10, 2015 at 7:34 PM, Fábio Dias <fabio.dias@gmail.com> wrote:

What would be the 'recommended' way of dealing with the 'errors' made
when the data is created with zero topological restrictions (and saved
as .shp?)

For import, try to find a snapping threshold for v.in.ogr that
produces an error-free output. Ideally the output would not only be
error-free, but the number of centroids would match the number of
input polygons (both are reported by v.in.ogr). The min_area option of
v.in.ogr could also help. The bulk of the cleaning should be done by
v.in.ogr. After that, removing small areas with v.clean tool=rmarea,
threshold in square meters, could help. For Terraclass (and PRODES)
which are mainly based on Landsat data, a threshold of 10 square
meters could remove artefacts and preserve valid areas (Landsat pixel
size is about 90 square meters). The threshold needs to be empirically
determined.

I am not aware of a standard procedure that works for all data sources.

I have taken liberty to merge this answer into

http://grasswiki.osgeo.org/wiki/Vector_topology_cleaning

markusN