[GRASS-user] Unable to get rid of duplicate polygons

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated. I thought I could use v.clean tool=rmdupl to get rid of these polygons. I use v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
overlap in input layer(s). Such areas are linked to more than 1
row in attribute table. The number of features for those areas is
stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only different attribute is the cat which grass added? What am I missing? I then tried v.clean tool=bpol,rmdupl and nothing changes it still has the 8 duplicates. What am I doing wrong?

I am using grass 6.3.0 on fedora core 14 linux.

Thanks.

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman <dbakeman@comcast.net> wrote:

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated. I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
overlap in input layer(s). Such areas are linked to more than 1
row in attribute table. The number of features for those areas is
stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the 8
duplicates. What am I doing wrong?

I think that you need to add the break tool for v.clean.

I am using grass 6.3.0 on fedora core 14 linux.

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115

Markus

Markus Neteler wrote:

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman<dbakeman@comcast.net> wrote:
   

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated. I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
          overlap in input layer(s). Such areas are linked to more than 1
          row in attribute table. The number of features for those areas is
          stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the 8
duplicates. What am I doing wrong?
     

I think that you need to add the break tool for v.clean.
   

Correct that was actually what I was using: v.clean tool=break,rmdupl

Looking closer I see that when I run v.clean it doesn't even report the duplicates that v.in.ogr did but they are still there. The only thing that differs in coordinates or attributes is the cat attribute that grass added.

   

I am using grass 6.3.0 on fedora core 14 linux.
     

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115
   

Thanks I'll see if I can upgrade.

Markus

David J. Bakeman wrote:

Markus Neteler wrote:

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman<dbakeman@comcast.net> wrote:

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated. I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
          overlap in input layer(s). Such areas are linked to more than 1
          row in attribute table. The number of features for those areas is
          stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the 8
duplicates. What am I doing wrong?

I think that you need to add the break tool for v.clean.

Correct that was actually what I was using: v.clean tool=break,rmdupl

Looking closer I see that when I run v.clean it doesn't even report the duplicates that v.in.ogr did but they are still there. The only thing that differs in coordinates or attributes is the cat attribute that grass added.

I am using grass 6.3.0 on fedora core 14 linux.

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115

Thanks I'll see if I can upgrade.

I upgraded to the 6.4.0 and the results are exactly the same. The polygons really are identical in every respect except for they have different values in the cat column. Is there some other grass tool for removing this kind of duplicate?

Markus

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

On Sun, Apr 1, 2012 at 11:45 PM, David J. Bakeman <dbakeman@comcast.net> wrote:

David J. Bakeman wrote:

Markus Neteler wrote:

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman<dbakeman@comcast.net>
wrote:

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated.
I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I
use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
overlap in input layer(s). Such areas are linked to more than 1
row in attribute table. The number of features for those areas
is
stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the
8
duplicates. What am I doing wrong?

I think that you need to add the break tool for v.clean.

Correct that was actually what I was using: v.clean tool=break,rmdupl

Looking closer I see that when I run v.clean it doesn't even report the
duplicates that v.in.ogr did but they are still there. The only thing that
differs in coordinates or attributes is the cat attribute that grass added.

I am using grass 6.3.0 on fedora core 14 linux.

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115

Thanks I'll see if I can upgrade.

I upgraded to the 6.4.0 and the results are exactly the same. The polygons
really are identical in every respect except for they have different values
in the cat column. Is there some other grass tool for removing this kind of
duplicate?

After import with v.in.ogr, there are no duplicate geometries left in
the vector. What you have now is some areas with two categories
assigned to them. Removing the duplicates means in this case removing
one of the two categories, for example with one of the vector
digitizers.

Markus M

Markus Metz wrote:

On Sun, Apr 1, 2012 at 11:45 PM, David J. Bakeman<dbakeman@comcast.net> wrote:
   

David J. Bakeman wrote:
     

Markus Neteler wrote:
       

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman<dbakeman@comcast.net>
  wrote:
         

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated.
  I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I
use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
          overlap in input layer(s). Such areas are linked to more than 1
          row in attribute table. The number of features for those areas
is
          stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the
8
duplicates. What am I doing wrong?
           

I think that you need to add the break tool for v.clean.
         

Correct that was actually what I was using: v.clean tool=break,rmdupl

Looking closer I see that when I run v.clean it doesn't even report the
duplicates that v.in.ogr did but they are still there. The only thing that
differs in coordinates or attributes is the cat attribute that grass added.
       

I am using grass 6.3.0 on fedora core 14 linux.
           

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115
         

Thanks I'll see if I can upgrade.
       

I upgraded to the 6.4.0 and the results are exactly the same. The polygons
really are identical in every respect except for they have different values
in the cat column. Is there some other grass tool for removing this kind of
duplicate?
     

After import with v.in.ogr, there are no duplicate geometries left in
the vector. What you have now is some areas with two categories
assigned to them. Removing the duplicates means in this case removing
one of the two categories, for example with one of the vector
digitizers.
   

I'm relatively new to grass but that doesn't make sense. I started with a shapefile with duplicate features. That is polygons with the exact same attributes and geometry (they are identical). What I thought grass could do for me was to read it in and delete one of the duplicates without user intervention. After all it identifies the duplicates so why can't it delete one?

Are you saying that the duplicate geometry was deleted but it kept both rows even though they were identical as well? Is there a operation that would identify and delete rows that differ in only the cat attribute.

Markus M

Hi David,

Deleting duplicates that are only different in the cat column would definitely be possible with SQL, but I don’t know enough SQL to give you a quick command for that. I think the relevant command would be v.extract.

Other than that, GRASS does have many fundamental differences compared to other GIS - for example, that vectors are attached to tables, but not in the sense that one entry in the table necessarily means one vector feature. Features can e.g. be attached to several tables simultaneously, something that is also possible with other GIS but functions differently. A good description of the GRASS vector model is available here:
http://grass.osgeo.org/gdp/html_grass63/vectorintro.html

Hope that helps at least a little bit :slight_smile:
Daniel

B.Sc. Daniel Lee
Geschäftsführung für Forschung und Entwicklung
ISIS - International Solar Information Solutions GbR
Vertreten durch: Daniel Lee, Nepomuk Reinhard und Nils Räder

Deutschhausstr. 10
35037 Marburg
Festnetz: +49 6421 379 6256
Mobil: +49 176 6127 7269
E-Mail: Lee@isi-solutions.org
Web: http://www.isi-solutions.org

ISIS wird gefördert durch die Bundesrepublik Deutschland, Zuwendungsgeber: Bundesministerium für Wirtschaft und Technologie aufgrund eines Beschlusses des Deutschen Bundestages, sowie durch die Europäische Union, Zuwendungsgeber: Europäischer Sozialfonds.
Zusätzliche Unterstützung erhält ISIS von dem Entrepreneurship Cluster Mittelhessen, der Universität Marburg, dem Laboratory for Climatology and Remote Sensing und dem GIS-Lab Marburg.

Am 2. April 2012 21:09 schrieb David J. Bakeman <dbakeman@comcast.net>:

Markus Metz wrote:

On Sun, Apr 1, 2012 at 11:45 PM, David J. Bakeman<dbakeman@comcast.net> wrote:

David J. Bakeman wrote:

Markus Neteler wrote:

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman<dbakeman@comcast.net>
wrote:

I didn’t find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated.
I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I
use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
overlap in input layer(s). Such areas are linked to more than 1
row in attribute table. The number of features for those areas
is
stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the
8
duplicates. What am I doing wrong?

I think that you need to add the break tool for v.clean.

Correct that was actually what I was using: v.clean tool=break,rmdupl

Looking closer I see that when I run v.clean it doesn’t even report the
duplicates that v.in.ogr did but they are still there. The only thing that
differs in coordinates or attributes is the cat attribute that grass added.

I am using grass 6.3.0 on fedora core 14 linux.

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115

Thanks I’ll see if I can upgrade.

I upgraded to the 6.4.0 and the results are exactly the same. The polygons
really are identical in every respect except for they have different values
in the cat column. Is there some other grass tool for removing this kind of
duplicate?

After import with v.in.ogr, there are no duplicate geometries left in
the vector. What you have now is some areas with two categories
assigned to them. Removing the duplicates means in this case removing
one of the two categories, for example with one of the vector
digitizers.

I’m relatively new to grass but that doesn’t make sense. I started with a shapefile with duplicate features. That is polygons with the exact same attributes and geometry (they are identical). What I thought grass could do for me was to read it in and delete one of the duplicates without user intervention. After all it identifies the duplicates so why can’t it delete one?

Are you saying that the duplicate geometry was deleted but it kept both rows even though they were identical as well? Is there a operation that would identify and delete rows that differ in only the cat attribute.

Markus M


grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Keeping both attributes would make sense in many cases of overlapping polygons, which, I would guess, is more often about partly overlapping polygons due to sloppy digitizing. How is GRASS going to tell what feature is more important?

But, in case of duplicate rows in your attribute table, except for the cat, you might be able to remove the duplicate rows in the attribute table using a simple SQL statement (not sure this works if you use dbf as database backend), something along the lines of (google if this doesn’t work).

DELETE FROM table WHERE cat NOT IN
(SELECT MIN(cat) FROM table GROUP BY XXX);

Whereby XXX would be column with unique values mapping unit (if there is not such column, you need to GROUP on a set of columns that together uniquely define each mapping unit). You can do this in the db.execute. Alternatively, you can use the SELECT statement in the advanced SQL query builder in the GRASS Attribute Table Manager to select the duplicates and delete them there.

If you use the dbf as database backend and the above doesn’t work, you can open the dbf file (which you can find in the ‘GRASS DB / LOCATION/ MAPSET /dbf’ folder) in Libreoffice and select all duplicates and delete, e.g., using a pivot table. Do not use excel, in my experience that may mess up your dbf file.

In all cases, this is just to remove rows that are identical except for one column… you’ll have to test whether the results make sense in your case and you are not messing up your polygon layer.

There seems to be some confusion here about what v.in.ogr does or does
not and the relation of attributes to geometries.

v.in.ogr does not remove duplicate polygons, it does not even check
for duplicate polygons, and most importantly it keeps (should keep)
all polygons present in the input layer(s) and represents them as
topological areas composed from boundaries. In case of overlapping
polygons, the overlapping parts are marked as such by assigning
multiple categories to them, one for each original polygon.
Additionally, The number of features for those areas is stored as
category in layer X, X being a number reported by v.in.ogr. In order
to get rid of duplicate polygons, one of the categories needs to be
removed from the corresponding area. AFAICT, this needs to be done
manually with one of the vector digitizers.

Deleting a row in the attribute table does not delete the
corresponding geometry or geometries, or to be more precise, the
corresponding category value from geometries. Likewise, deleting a
geometry does not necessarily delete the corresponding entry (entries)
in the attribute table.

Markus M

On Mon, Apr 2, 2012 at 10:11 PM, Paulo van Breugel
<p.vanbreugel@gmail.com> wrote:

Keeping both attributes would make sense in many cases of overlapping
polygons, which, I would guess, is more often about partly overlapping
polygons due to sloppy digitizing. How is GRASS going to tell what feature
is more important?

But, in case of duplicate rows in your attribute table, except for the cat,
you might be able to remove the duplicate rows in the attribute table using
a simple SQL statement (not sure this works if you use dbf as database
backend), something along the lines of (google if this doesn't work).

DELETE FROM table WHERE cat NOT IN
(SELECT MIN(cat) FROM table GROUP BY XXX);

Whereby XXX would be column with unique values mapping unit (if there is not
such column, you need to GROUP on a set of columns that together uniquely
define each mapping unit). You can do this in the db.execute. Alternatively,
you can use the SELECT statement in the advanced SQL query builder in the
GRASS Attribute Table Manager to select the duplicates and delete them
there.

If you use the dbf as database backend and the above doesn't work, you can
open the dbf file (which you can find in the 'GRASS DB / LOCATION/ MAPSET
/dbf' folder) in Libreoffice and select all duplicates and delete, e.g.,
using a pivot table. Do not use excel, in my experience that may mess up
your dbf file.

In all cases, this is just to remove rows that are identical except for one
column... you'll have to test whether the results make sense in your case
and you are not messing up your polygon layer.

On 04/02/2012 09:09 PM, David J. Bakeman wrote:

Markus Metz wrote:

On Sun, Apr 1, 2012 at 11:45 PM, David J. Bakeman<dbakeman@comcast.net>
wrote:

David J. Bakeman wrote:

Markus Neteler wrote:

On Sun, Apr 1, 2012 at 9:45 PM, David J. Bakeman<dbakeman@comcast.net>
wrote:

I didn't find an answer in the archives so.

I have a shapefile of polygons and some of the polygons are duplicated.
I
thought I could use v.clean tool=rmdupl to get rid of these polygons. I
use
v.in.ogr to read it in and I get the following:

WARNING: 8 areas represent more (overlapping) features, because polygons
overlap in input layer(s). Such areas are linked to more than 1
row in attribute table. The number of features for those areas
is
stored as category in layer 2

That is correct in that there are 8 duplicate polygons but the only
different attribute is the cat which grass added? What am I missing? I
then tried v.clean tool=bpol,rmdupl and nothing changes it still has the
8
duplicates. What am I doing wrong?

I think that you need to add the break tool for v.clean.

Correct that was actually what I was using: v.clean tool=break,rmdupl

Looking closer I see that when I run v.clean it doesn't even report the
duplicates that v.in.ogr did but they are still there. The only thing that
differs in coordinates or attributes is the cat attribute that grass added.

I am using grass 6.3.0 on fedora core 14 linux.

Please note that you can upgrade to grass-6.4.0-4.fc14:
http://koji.fedoraproject.org/koji/buildinfo?buildID=263115

Thanks I'll see if I can upgrade.

I upgraded to the 6.4.0 and the results are exactly the same. The polygons
really are identical in every respect except for they have different values
in the cat column. Is there some other grass tool for removing this kind of
duplicate?

After import with v.in.ogr, there are no duplicate geometries left in
the vector. What you have now is some areas with two categories
assigned to them. Removing the duplicates means in this case removing
one of the two categories, for example with one of the vector
digitizers.

I'm relatively new to grass but that doesn't make sense. I started with a
shapefile with duplicate features. That is polygons with the exact same
attributes and geometry (they are identical). What I thought grass could do
for me was to read it in and delete one of the duplicates without user
intervention. After all it identifies the duplicates so why can't it delete
one?

Are you saying that the duplicate geometry was deleted but it kept both rows
even though they were identical as well? Is there a operation that would
identify and delete rows that differ in only the cat attribute.

Markus M

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

You can use

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

Ah, that is of course true. It would be nice if one had the option to remove the category value from the geometries when deleting the category value in the table (I guess there would be disadvantages too, perhaps the implementation).

But where I got confused is that when I import a vector layer with overlapping polygons, I indeed get the message that “The number of features for those areas is stored as category in layer 2” as you mention below. However, when in the attribute table manager I do not see a second layer. Am I overlooking something or is this some bug?

Paulo

On 04/04/12 14:06, Paulo van Breugel wrote:

  Ah, that is of course true. It would be nice if one had the option to
remove the category value from the geometries when deleting the category
value in the table (I guess there would be disadvantages too, perhaps
the implementation).

But where I got confused is that when I import a vector layer with
overlapping polygons, I indeed get the message that "The number of
features for those areas is stored as category in layer 2" as you
mention below. However, when in the attribute table manager I do not see
a second layer. Am I overlooking something or is this some bug?

AFAIK, no table is attached to layer 2. However, there are category values in that layer for those polygons which are the result of overlaps.

Category values are a identifiers of features that are stored within a vector layer. One can _optionally_ link an attribute table to that layer using the category values as keys.

Moritz