[GRASS-user] Help: Completely confused about multi-layered vectors trying to import TIGER/Line files

I have been trying to wrap my brain around "multi-layered" GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here with
a solid understanding of this stuff can help me.

I'm trying to figure out how to import TIGER/Line data and actually get the
attributes of areas pulled in. This is trouble.

The v.in.ogr documentation has an example of how to do it:

v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t35001_all \
                     type=boundary,centroid snap=-1

which does indeed import the CompleteChain layer and PIP (Polygon Internal
Point) layers --- it puts the boundaries in layer 1 and the centroids in
layer 2, and if I do a
  d.vect t35001_all layer=2
I can see the areas just fine.

Thing is, TIGER data has almost no usable attributes in the PIP layer itself
--- this is nothing more than a centroid with an attribute "POLYID", telling
nothing about the area. To get that information, one needs additional tables
linked to the POLYID field. The TIGER data tries to be a normal form database,
so there are many tables with no geometry that relate back to linking fields
in the attribute tables that are attached to geometry.

There's another table in the TIGER data called "Polygon" that has more
attributes related to the census, such as congressional district and such.
Actual names and types of areas (parks, military installations, etc) are
linked through two tables, AreaLandmarks and Landmarks --- the first links
POLYID to a LAND attribute (an integer) and the second links the LAND
attribute to an actual name --- there is a many-to-one relationship between
AreaLandmarks and Landmarks. The latter table actually contains the feature
class code that tells you what type of landmark the polygon represents. Not
every polygon has a matching record in AreaLandmarks, only those that actually
represent landmarks. The other polygons are just administrative regions or
other artificial areas. On top of that, there appear to be some records of
Landmarks that do NOT have any connection to records in AreaLandmarks -- these
are point landmarks, and so the Landmarks layer of the TIGER data also has
point geometry. Where there is a relationship between Landmarks and
AreaLandmarks I suppose the Landmark point would be usable as a label point.

Attaching the "Polygon" records to the centroids seems an easy problem, as
every centroid has a Polygon record with attributes, and all that's needed
is a table join on the POLYID field between the table already attached to the
centroids and the Polygon table imported with db.in.ogr. I can do that.

The AreaLandmarks are another matter --- only a small fraction of the
polygons have associated AreaLandmarks. For example, in the TIGER/Line data
for Bernalillo County, New Mexico, there are 18597 PIP and Polygon records,
but only 1292 AreaLandmarks records. This makes it seem to me that
AreaLandmarks would be a prime candidate for a third "layer" in the
"t35001_all" vector. My problem is selecting the geometric objects to tie to
the records of the AreaLandmarks table.

I have tried a number of naive things that didn't work. First, I tried to
add a "cat" field to the AreaLandmarks table, then use an SQL UPDATE query
to copy the cat column of the table associated with the PIP records (which
gets called "t35001_all_2" by the v.in.ogr import) that have POLYIDs that
match the ones in the AreaLandmarks:

   echo "UPDATE t35001_AreaLandmarks SET cat=(SELECT cat FROM t35001_all_2 WHERE t35001_all_2.POLYID=t35001_AreaLandmarks.POLYID)" | db.execute

and then adding AreaLandmarks as a third database connection. That SQL
update worked fine in the sense that the AreaLandmarks table got the right
cat values out of the layer 2 table, but the approach didn't work, because
as far as I understand it the categories of objects in layer 2 haven't
actually been assigned to geometric objects for layer 3 yet, so linking this
way is meaningless at this step. Attaching the table gives me some warnings,
and apparently attaches no database records to any geometry at all.

So then I tried creating a new reclassified vector with

  v.reclass in=t35001_all out=foo col=POLYID layer=2 type=centroid

setting the categories of layer 2 in the new vector to be equal to the POLYID
of layer 2 in the old vector without copying the table, then doing:

  echo "UPDATE t35001_AreaLandmarks SET cat=POLYID" | db.execute

and attaching that table to layer 2 of the reclassified vector. This sorta
kinda worked, in that the AreaLandmarks records were attached to features
properly, but left a very large fraction of the features without database
connections. So I could display vector "foo" and use d.what.vect to query
my new attributes, but there were 17305 polygons out of 18597 that I could
click and get a warning of a missing database entry --- I would rather have
only the polygons that have an AreaLandmarks record show up at all when
displaying this layer.

Since I'm rambling now, I'll just cut to the chase and ask my question:

Given a GRASS vector with two attached database tables, one of which (layer
2, attaching attributes to centrois) has a key field POLYID, how can I create
a new layer using a third, much smaller table, that also has a POLYID field,
such that the new layer only contains that subset of centroids where the
third layer table's POLYID matches layer 2's POLYID for that centroid?

At some later point I'll worry about the Landmarks point layer and its
associated attributes, after I've figured out what the story is with
connecting the AreaLandmarks to the subset of centroids.

I am convinced that if I truly understood how layers work in GRASS then my
question's answer would be self-evident, but I also know that "if A then B"
is a true statement when both A and B are false. I have been completely
unable to piece together a thorough (or even marginal) understanding from man
pages and even the third edition "Open Source GIS: A GRASS GIS Approach"
book. Any help anyone can give me would be greatly appreciated.

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you're good and crazy, oooh, oooh,
oooh, the sky is the limit!" --- The Tick

On Thu, Feb 28, 2008 at 7:39 AM, Tom Russo <russo@bogodyn.org> wrote:

I have been trying to wrap my brain around "multi-layered" GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here with
a solid understanding of this stuff can help me.

I'm trying to figure out how to import TIGER/Line data and actually get the
attributes of areas pulled in. This is trouble.

The v.in.ogr documentation has an example of how to do it:

v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t35001_all \
                     type=boundary,centroid snap=-1

which does indeed import the CompleteChain layer and PIP (Polygon Internal
Point) layers --- it puts the boundaries in layer 1 and the centroids in
layer 2, and if I do a
  d.vect t35001_all layer=2
I can see the areas just fine.

Thing is, TIGER data has almost no usable attributes in the PIP layer itself
--- this is nothing more than a centroid with an attribute "POLYID", telling
nothing about the area. To get that information, one needs additional tables
linked to the POLYID field. The TIGER data tries to be a normal form database,
so there are many tables with no geometry that relate back to linking fields
in the attribute tables that are attached to geometry.

There's another table in the TIGER data called "Polygon" that has more
attributes related to the census, such as congressional district and such.
Actual names and types of areas (parks, military installations, etc) are
linked through two tables, AreaLandmarks and Landmarks --- the first links
POLYID to a LAND attribute (an integer) and the second links the LAND
attribute to an actual name --- there is a many-to-one relationship between
AreaLandmarks and Landmarks. The latter table actually contains the feature
class code that tells you what type of landmark the polygon represents. Not
every polygon has a matching record in AreaLandmarks, only those that actually
represent landmarks. The other polygons are just administrative regions or
other artificial areas. On top of that, there appear to be some records of
Landmarks that do NOT have any connection to records in AreaLandmarks -- these
are point landmarks, and so the Landmarks layer of the TIGER data also has
point geometry. Where there is a relationship between Landmarks and
AreaLandmarks I suppose the Landmark point would be usable as a label point.

Attaching the "Polygon" records to the centroids seems an easy problem, as
every centroid has a Polygon record with attributes, and all that's needed
is a table join on the POLYID field between the table already attached to the
centroids and the Polygon table imported with db.in.ogr. I can do that.

The AreaLandmarks are another matter --- only a small fraction of the
polygons have associated AreaLandmarks. For example, in the TIGER/Line data
for Bernalillo County, New Mexico, there are 18597 PIP and Polygon records,
but only 1292 AreaLandmarks records. This makes it seem to me that
AreaLandmarks would be a prime candidate for a third "layer" in the
"t35001_all" vector. My problem is selecting the geometric objects to tie to
the records of the AreaLandmarks table.

I have tried a number of naive things that didn't work. First, I tried to
add a "cat" field to the AreaLandmarks table, then use an SQL UPDATE query
to copy the cat column of the table associated with the PIP records (which
gets called "t35001_all_2" by the v.in.ogr import) that have POLYIDs that
match the ones in the AreaLandmarks:

   echo "UPDATE t35001_AreaLandmarks SET cat=(SELECT cat FROM t35001_all_2 WHERE t35001_all_2.POLYID=t35001_AreaLandmarks.POLYID)" | db.execute

and then adding AreaLandmarks as a third database connection. That SQL
update worked fine in the sense that the AreaLandmarks table got the right
cat values out of the layer 2 table, but the approach didn't work, because
as far as I understand it the categories of objects in layer 2 haven't
actually been assigned to geometric objects for layer 3 yet, so linking this
way is meaningless at this step. Attaching the table gives me some warnings,
and apparently attaches no database records to any geometry at all.

So then I tried creating a new reclassified vector with

  v.reclass in=t35001_all out=foo col=POLYID layer=2 type=centroid

setting the categories of layer 2 in the new vector to be equal to the POLYID
of layer 2 in the old vector without copying the table, then doing:

  echo "UPDATE t35001_AreaLandmarks SET cat=POLYID" | db.execute

and attaching that table to layer 2 of the reclassified vector. This sorta
kinda worked, in that the AreaLandmarks records were attached to features
properly, but left a very large fraction of the features without database
connections. So I could display vector "foo" and use d.what.vect to query
my new attributes, but there were 17305 polygons out of 18597 that I could
click and get a warning of a missing database entry --- I would rather have
only the polygons that have an AreaLandmarks record show up at all when
displaying this layer.

Hi Tom,

TIGER data (in its current format) is a beast to work with-- in any
GIS. I have found that the simplest way to get "normal" looking data
out of the TIGER files was with the tutorial on processing these data
using PostGIS [1]. There are excellent instructions on the GeoServer
page [1] that make use of basic PostGIS functionality, along with some
self-contained java utilities. Following those instructions I was able
to get *most* of the point, line, and polygon data (with meaningful
attributes) extracted to simple tables in PostGIS. It would be a
simple matter to export to GRASS from there.

I think that you can get 2000 TIGER data, by county, from our favorite
'leading GIS vendor' for free [2]. That format should be simpler to
use.

There is also tiger2shp [3] which is now distributed for free (Windows).

Finally, it looks like the US Census will be moving away from the
current data format, and will be henceforth distributing their data as
shapefiles [4]. Hurray!

1. http://geoserver.org/display/GEOSDOC/Loading+TIGER+data
2. http://www.esri.com/data/download/census2000_tigerline/index.html
3. http://tnatlas.geog.utk.edu/downloadfree.htm
4. http://www.census.gov/geo/www/tiger/future/future_tl.html

Since I'm rambling now, I'll just cut to the chase and ask my question:

  Given a GRASS vector with two attached database tables, one of which (layer
  2, attaching attributes to centrois) has a key field POLYID, how can I create
  a new layer using a third, much smaller table, that also has a POLYID field,
  such that the new layer only contains that subset of centroids where the
  third layer table's POLYID matches layer 2's POLYID for that centroid?

At some later point I'll worry about the Landmarks point layer and its
associated attributes, after I've figured out what the story is with
connecting the AreaLandmarks to the subset of centroids.

I am convinced that if I truly understood how layers work in GRASS then my
question's answer would be self-evident, but I also know that "if A then B"
is a true statement when both A and B are false. I have been completely
unable to piece together a thorough (or even marginal) understanding from man
pages and even the third edition "Open Source GIS: A GRASS GIS Approach"
book. Any help anyone can give me would be greatly appreciated.

I cannot offer any insight into the GRASS vector/layers system right
now, but there are several good threads from a couple of months ago on
this very topic (vector/layers/confusion).

Good luck,

Dylan

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
  one trick, rational thinking, but when you're good and crazy, oooh, oooh,
  oooh, the sky is the limit!" --- The Tick
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

On Thursday 28 February 2008, Tom Russo wrote:

On Thu, Feb 28, 2008 at 07:57:48AM -0800, we recorded a bogon-computron

collision of the <dylan.beaudette@gmail.com> flavor, containing:

> TIGER data (in its current format) is a beast to work with-- in any
> GIS. I have found that the simplest way to get "normal" looking data
> out of the TIGER files was with the tutorial on processing these data
> using PostGIS [1]. There are excellent instructions on the GeoServer
> page [1] that make use of basic PostGIS functionality, along with some
> self-contained java utilities. Following those instructions I was able
> to get *most* of the point, line, and polygon data (with meaningful
> attributes) extracted to simple tables in PostGIS. It would be a
> simple matter to export to GRASS from there.

I will take a look at that.

I can also do some of this processing using OGR and Python (I once had the
displeasure of working on such a script to make polygon shapefiles out
of TIGER/Line files), but my goal here is not so much "what tools external
to GRASS can I use to recast the problem" but "how can I learn GRASS tools
better?"

Understandable-- although I would look for a better data set from which to
learn about layers. That TIGER data is a beast.

> I think that you can get 2000 TIGER data, by county, from our favorite
> 'leading GIS vendor' for free [2]. That format should be simpler to
> use.

Yeah, but that won't teach me anything about layers in GRASS.

Understandably.

> There is also tiger2shp [3] which is now distributed for free (Windows).
>
> Finally, it looks like the US Census will be moving away from the
> current data format, and will be henceforth distributing their data as
> shapefiles [4]. Hurray!

Yeah, saw that some time ago. I had concerns about the initial set of
attributes they said they would be distributing in the data, but they have
changed that and it looks usable. Those will be available in March,
according to http://www.census.gov/geo/www/tiger/tgrshp.html

> I cannot offer any insight into the GRASS vector/layers system right
> now, but there are several good threads from a couple of months ago on
> this very topic (vector/layers/confusion).

I'll see if I can search the archives and find those.

Here is a good starting point:

http://www.nabble.com/forum/Search.jtp?forum=1200&local=y&query=layer

If there aren't any good examples, it might be nice to work up something for
the new NC sample dataset -- so that there is a common framework for
teaching/learning about vector layers. In the last 5 years I have seen the
topic come up several times.

Cheers,
Dylan

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

On 28/02/08 16:39, Tom Russo wrote:

I have been trying to wrap my brain around "multi-layered" GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here with a solid understanding of this stuff can help me.

[...]

Since I'm rambling now, I'll just cut to the chase and ask my question:

Given a GRASS vector with two attached database tables, one of which (layer
2, attaching attributes to centrois) has a key field POLYID, how can I create a new layer using a third, much smaller table, that also has a POLYID field, such that the new layer only contains that subset of centroids where the third layer table's POLYID matches layer 2's POLYID for that centroid?

I think at this stage your best bet is probably to do something like this:

v.category OldMap option=add layer=3 out=NewMap

#find a link between the category numbers created and the original
# categories, this should work:

v.build OldMap option=cdump > TempFile

# You will have to edit TempFile to only take into account layer 2 and
# then find a way to import the results into your database

v.db.connect -o NewMap layer=3 table=NewTable

#to remove all centroids you do not want:
v.edit tool=delete
or
v.extract

There might be an easier way, though...

I think that you also might want to file a few wishes in the tracker, such as:

- extend v.category with an option to use a column linked to an existing layer as the cat for a new layer
- allow v.extract to write the results to another layer of the same file instead of a different file
- extend v.patch to allow to patch different maps into different layers of the output map
- create a v.copy.layers to copy elements of one layer to another

Moritz

On Thu, Feb 28, 2008 at 06:54:09PM +0100, we recorded a bogon-computron collision of the <mlennert@club.worldonline.be> flavor, containing:

On 28/02/08 16:39, Tom Russo wrote:

I have been trying to wrap my brain around "multi-layered" GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here
with a solid understanding of this stuff can help me.

[...]

Since I'm rambling now, I'll just cut to the chase and ask my question:

Given a GRASS vector with two attached database tables, one of which (layer
2, attaching attributes to centrois) has a key field POLYID, how can I
create a new layer using a third, much smaller table, that also has a
POLYID field, such that the new layer only contains that subset of
centroids where the third layer table's POLYID matches layer 2's POLYID
for that centroid?

I think at this stage your best bet is probably to do something like this:

v.category OldMap option=add layer=3 out=NewMap

#find a link between the category numbers created and the original
# categories, this should work:

v.build OldMap option=cdump > TempFile

# You will have to edit TempFile to only take into account layer 2 and
# then find a way to import the results into your database

With the potential for tens of thousands of records, anything
involving manually editing a dump of categories seems an unworkable solution.

v.db.connect -o NewMap layer=3 table=NewTable

#to remove all centroids you do not want:
v.edit tool=delete
or
v.extract

There might be an easier way, though...

I sure hope so.

I think that you also might want to file a few wishes in the tracker, such
as:

- extend v.category with an option to use a column linked to an existing
layer as the cat for a new layer
- allow v.extract to write the results to another layer of the same file
instead of a different file
- extend v.patch to allow to patch different maps into different layers of
the output map
- create a v.copy.layers to copy elements of one layer to another

Once I actually understand the steps of what I really need to do, then
perhaps I could intelligently formulate a real wish for the tracker.
At this point I'm simply adrift in a sea of v. and db. commands and
SQL queries with no compass.

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you're good and crazy, oooh, oooh,
oooh, the sky is the limit!" --- The Tick

On 28/02/08 20:12, Tom Russo wrote:

On Thu, Feb 28, 2008 at 06:54:09PM +0100, we recorded a bogon-computron collision of the <mlennert@club.worldonline.be> flavor, containing:

On 28/02/08 16:39, Tom Russo wrote:

I have been trying to wrap my brain around "multi-layered" GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here with a solid understanding of this stuff can help me.

[...]

Since I'm rambling now, I'll just cut to the chase and ask my question:

Given a GRASS vector with two attached database tables, one of which (layer
2, attaching attributes to centrois) has a key field POLYID, how can I create a new layer using a third, much smaller table, that also has a POLYID field, such that the new layer only contains that subset of centroids where the third layer table's POLYID matches layer 2's POLYID for that centroid?

I think at this stage your best bet is probably to do something like this:

v.category OldMap option=add layer=3 out=NewMap

#find a link between the category numbers created and the original
# categories, this should work:

v.build OldMap option=cdump > TempFile

# You will have to edit TempFile to only take into account layer 2 and
# then find a way to import the results into your database

With the potential for tens of thousands of records, anything
involving manually editing a dump of categories seems an unworkable solution.

With awk / sed magic this should not be too complicated to automate...

Moritz

If it’s helpful, you can also extract attributes to individual .shp files using ogr2ogr -where. For example

ogr2ogr -where ‘cfcc = B11’ output.shp input_file.rt1 CompleteChain

(B11 is the TigerLine attribute code for main line railroads)

The individual layers can then be brought into GRASS.

There is a good explanation of this in the book Mapping Hacks (it’s Hack #68).

Dave

On Feb 28, 2008, at 10:57 AM, Dylan Beaudette wrote:

On Thu, Feb 28, 2008 at 7:39 AM, Tom Russo <russo@bogodyn.org> wrote:

I have been trying to wrap my brain around “multi-layered” GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here with
a solid understanding of this stuff can help me.

I’m trying to figure out how to import TIGER/Line data and actually get the
attributes of areas pulled in. This is trouble.

The v.in.ogr documentation has an example of how to do it:

v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t35001_all
type=boundary,centroid snap=-1

which does indeed import the CompleteChain layer and PIP (Polygon Internal
Point) layers — it puts the boundaries in layer 1 and the centroids in
layer 2, and if I do a
d.vect t35001_all layer=2
I can see the areas just fine.

Thing is, TIGER data has almost no usable attributes in the PIP layer itself
— this is nothing more than a centroid with an attribute “POLYID”, telling
nothing about the area. To get that information, one needs additional tables
linked to the POLYID field. The TIGER data tries to be a normal form database,
so there are many tables with no geometry that relate back to linking fields
in the attribute tables that are attached to geometry.

There’s another table in the TIGER data called “Polygon” that has more
attributes related to the census, such as congressional district and such.
Actual names and types of areas (parks, military installations, etc) are
linked through two tables, AreaLandmarks and Landmarks — the first links
POLYID to a LAND attribute (an integer) and the second links the LAND
attribute to an actual name — there is a many-to-one relationship between
AreaLandmarks and Landmarks. The latter table actually contains the feature
class code that tells you what type of landmark the polygon represents. Not
every polygon has a matching record in AreaLandmarks, only those that actually
represent landmarks. The other polygons are just administrative regions or
other artificial areas. On top of that, there appear to be some records of
Landmarks that do NOT have any connection to records in AreaLandmarks – these
are point landmarks, and so the Landmarks layer of the TIGER data also has
point geometry. Where there is a relationship between Landmarks and
AreaLandmarks I suppose the Landmark point would be usable as a label point.

Attaching the “Polygon” records to the centroids seems an easy problem, as
every centroid has a Polygon record with attributes, and all that’s needed
is a table join on the POLYID field between the table already attached to the
centroids and the Polygon table imported with db.in.ogr. I can do that.

The AreaLandmarks are another matter — only a small fraction of the
polygons have associated AreaLandmarks. For example, in the TIGER/Line data
for Bernalillo County, New Mexico, there are 18597 PIP and Polygon records,
but only 1292 AreaLandmarks records. This makes it seem to me that
AreaLandmarks would be a prime candidate for a third “layer” in the
“t35001_all” vector. My problem is selecting the geometric objects to tie to
the records of the AreaLandmarks table.

I have tried a number of naive things that didn’t work. First, I tried to
add a “cat” field to the AreaLandmarks table, then use an SQL UPDATE query
to copy the cat column of the table associated with the PIP records (which
gets called “t35001_all_2” by the v.in.ogr import) that have POLYIDs that
match the ones in the AreaLandmarks:

echo “UPDATE t35001_AreaLandmarks SET cat=(SELECT cat FROM t35001_all_2 WHERE t35001_all_2.POLYID=t35001_AreaLandmarks.POLYID)” | db.execute

and then adding AreaLandmarks as a third database connection. That SQL
update worked fine in the sense that the AreaLandmarks table got the right
cat values out of the layer 2 table, but the approach didn’t work, because
as far as I understand it the categories of objects in layer 2 haven’t
actually been assigned to geometric objects for layer 3 yet, so linking this
way is meaningless at this step. Attaching the table gives me some warnings,
and apparently attaches no database records to any geometry at all.

So then I tried creating a new reclassified vector with

v.reclass in=t35001_all out=foo col=POLYID layer=2 type=centroid

setting the categories of layer 2 in the new vector to be equal to the POLYID
of layer 2 in the old vector without copying the table, then doing:

echo “UPDATE t35001_AreaLandmarks SET cat=POLYID” | db.execute

and attaching that table to layer 2 of the reclassified vector. This sorta
kinda worked, in that the AreaLandmarks records were attached to features
properly, but left a very large fraction of the features without database
connections. So I could display vector “foo” and use d.what.vect to query
my new attributes, but there were 17305 polygons out of 18597 that I could
click and get a warning of a missing database entry — I would rather have
only the polygons that have an AreaLandmarks record show up at all when
displaying this layer.

Hi Tom,

TIGER data (in its current format) is a beast to work with-- in any
GIS. I have found that the simplest way to get “normal” looking data
out of the TIGER files was with the tutorial on processing these data
using PostGIS [1]. There are excellent instructions on the GeoServer
page [1] that make use of basic PostGIS functionality, along with some
self-contained java utilities. Following those instructions I was able
to get most of the point, line, and polygon data (with meaningful
attributes) extracted to simple tables in PostGIS. It would be a
simple matter to export to GRASS from there.

I think that you can get 2000 TIGER data, by county, from our favorite
‘leading GIS vendor’ for free [2]. That format should be simpler to
use.

There is also tiger2shp [3] which is now distributed for free (Windows).

Finally, it looks like the US Census will be moving away from the
current data format, and will be henceforth distributing their data as
shapefiles [4]. Hurray!

  1. http://geoserver.org/display/GEOSDOC/Loading+TIGER+data
  2. http://www.esri.com/data/download/census2000_tigerline/index.html
  3. http://tnatlas.geog.utk.edu/downloadfree.htm
  4. http://www.census.gov/geo/www/tiger/future/future_tl.html

Since I’m rambling now, I’ll just cut to the chase and ask my question:

Given a GRASS vector with two attached database tables, one of which (layer
2, attaching attributes to centrois) has a key field POLYID, how can I create
a new layer using a third, much smaller table, that also has a POLYID field,
such that the new layer only contains that subset of centroids where the
third layer table’s POLYID matches layer 2’s POLYID for that centroid?

At some later point I’ll worry about the Landmarks point layer and its
associated attributes, after I’ve figured out what the story is with
connecting the AreaLandmarks to the subset of centroids.

I am convinced that if I truly understood how layers work in GRASS then my
question’s answer would be self-evident, but I also know that “if A then B”
is a true statement when both A and B are false. I have been completely
unable to piece together a thorough (or even marginal) understanding from man
pages and even the third edition “Open Source GIS: A GRASS GIS Approach”
book. Any help anyone can give me would be greatly appreciated.

I cannot offer any insight into the GRASS vector/layers system right
now, but there are several good threads from a couple of months ago on
this very topic (vector/layers/confusion).

Good luck,

Dylan


Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
“And, isn’t sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you’re good and crazy, oooh, oooh,
oooh, the sky is the limit!” — The Tick


grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user


grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

On Fri, Feb 29, 2008 at 09:37:36AM -0500, we recorded a bogon-computron collision of the <dkindem@chartermi.net> flavor, containing:

If it's helpful, you can also extract attributes to individual .shp files
using ogr2ogr -where. For example

ogr2ogr -where 'cfcc = B11' output.shp input_file.rt1 CompleteChain

(B11 is the TigerLine attribute code for main line railroads)

The individual layers can then be brought into GRASS.

There is a good explanation of this in the book Mapping Hacks (it's Hack
#68).

This is fine for linear features, which are all stored in CompleteChain with
their attributes attached.

The problem is for area features, for which NO attributes like CFCC are
available until the AreaLandmarks and Polygon tables are merged with the
PIP records. That's what I'm on about. You can't do

  ogr2ogr -where "CFCC=D83" output.shp input_file.rt1 Landmarks

and get polygons of the national parks --- you would at best get a table
of attributes with no geometry.

It seems that the only thing to do is a table join, which requires the
use of something other than the DBF driver, as Michael points out (I have
been using sqlite for that reason). Perhaps doing a table join of
AreaLandmarks to PIP using the POLYID key, then selecting with v.extract only
those features that have non-null values in AreaLandmarks attributes will get
me what I want. I will try that next.

On Feb 28, 2008, at 10:57 AM, Dylan Beaudette wrote:

On Thu, Feb 28, 2008 at 7:39 AM, Tom Russo <russo@bogodyn.org> wrote:

I have been trying to wrap my brain around "multi-layered" GRASS vectors
and
have only succeeded in wrapping my brain into knots. Perhaps someone
here with
a solid understanding of this stuff can help me.

I'm trying to figure out how to import TIGER/Line data and actually get
the
attributes of areas pulled in. This is trouble.

The v.in.ogr documentation has an example of how to do it:

v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t35001_all
\
                     type=boundary,centroid snap=-1

which does indeed import the CompleteChain layer and PIP (Polygon
Internal
Point) layers --- it puts the boundaries in layer 1 and the centroids in
layer 2, and if I do a
  d.vect t35001_all layer=2
I can see the areas just fine.

Thing is, TIGER data has almost no usable attributes in the PIP layer
itself
--- this is nothing more than a centroid with an attribute "POLYID",
telling
nothing about the area. To get that information, one needs additional
tables
linked to the POLYID field. The TIGER data tries to be a normal form
database,
so there are many tables with no geometry that relate back to linking
fields
in the attribute tables that are attached to geometry.

There's another table in the TIGER data called "Polygon" that has more
attributes related to the census, such as congressional district and
such.
Actual names and types of areas (parks, military installations, etc)
are
linked through two tables, AreaLandmarks and Landmarks --- the first
links
POLYID to a LAND attribute (an integer) and the second links the LAND
attribute to an actual name --- there is a many-to-one relationship
between
AreaLandmarks and Landmarks. The latter table actually contains the
feature
class code that tells you what type of landmark the polygon represents.
Not
every polygon has a matching record in AreaLandmarks, only those that
actually
represent landmarks. The other polygons are just administrative regions
or
other artificial areas. On top of that, there appear to be some records
of
Landmarks that do NOT have any connection to records in AreaLandmarks --
these
are point landmarks, and so the Landmarks layer of the TIGER data also
has
point geometry. Where there is a relationship between Landmarks and
AreaLandmarks I suppose the Landmark point would be usable as a label
point.

Attaching the "Polygon" records to the centroids seems an easy problem,
as
every centroid has a Polygon record with attributes, and all that's
needed
is a table join on the POLYID field between the table already attached
to the
centroids and the Polygon table imported with db.in.ogr. I can do that.

The AreaLandmarks are another matter --- only a small fraction of the
polygons have associated AreaLandmarks. For example, in the TIGER/Line
data
for Bernalillo County, New Mexico, there are 18597 PIP and Polygon
records,
but only 1292 AreaLandmarks records. This makes it seem to me that
AreaLandmarks would be a prime candidate for a third "layer" in the
"t35001_all" vector. My problem is selecting the geometric objects to
tie to
the records of the AreaLandmarks table.

I have tried a number of naive things that didn't work. First, I tried
to
add a "cat" field to the AreaLandmarks table, then use an SQL UPDATE
query
to copy the cat column of the table associated with the PIP records
(which
gets called "t35001_all_2" by the v.in.ogr import) that have POLYIDs
that
match the ones in the AreaLandmarks:

   echo "UPDATE t35001_AreaLandmarks SET cat=(SELECT cat FROM
t35001_all_2 WHERE t35001_all_2.POLYID=t35001_AreaLandmarks.POLYID)" |
db.execute

and then adding AreaLandmarks as a third database connection. That SQL
update worked fine in the sense that the AreaLandmarks table got the
right
cat values out of the layer 2 table, but the approach didn't work,
because
as far as I understand it the categories of objects in layer 2 haven't
actually been assigned to geometric objects for layer 3 yet, so linking
this
way is meaningless at this step. Attaching the table gives me some
warnings,
and apparently attaches no database records to any geometry at all.

So then I tried creating a new reclassified vector with

  v.reclass in=t35001_all out=foo col=POLYID layer=2 type=centroid

setting the categories of layer 2 in the new vector to be equal to the
POLYID
of layer 2 in the old vector without copying the table, then doing:

  echo "UPDATE t35001_AreaLandmarks SET cat=POLYID" | db.execute

and attaching that table to layer 2 of the reclassified vector. This
sorta
kinda worked, in that the AreaLandmarks records were attached to
features
properly, but left a very large fraction of the features without
database
connections. So I could display vector "foo" and use d.what.vect to
query
my new attributes, but there were 17305 polygons out of 18597 that I
could
click and get a warning of a missing database entry --- I would rather
have
only the polygons that have an AreaLandmarks record show up at all when
displaying this layer.

Hi Tom,

TIGER data (in its current format) is a beast to work with-- in any
GIS. I have found that the simplest way to get "normal" looking data
out of the TIGER files was with the tutorial on processing these data
using PostGIS [1]. There are excellent instructions on the GeoServer
page [1] that make use of basic PostGIS functionality, along with some
self-contained java utilities. Following those instructions I was able
to get *most* of the point, line, and polygon data (with meaningful
attributes) extracted to simple tables in PostGIS. It would be a
simple matter to export to GRASS from there.

I think that you can get 2000 TIGER data, by county, from our favorite
'leading GIS vendor' for free [2]. That format should be simpler to
use.

There is also tiger2shp [3] which is now distributed for free (Windows).

Finally, it looks like the US Census will be moving away from the
current data format, and will be henceforth distributing their data as
shapefiles [4]. Hurray!

1. http://geoserver.org/display/GEOSDOC/Loading+TIGER+data
2. http://www.esri.com/data/download/census2000_tigerline/index.html
3. http://tnatlas.geog.utk.edu/downloadfree.htm
4. http://www.census.gov/geo/www/tiger/future/future_tl.html

Since I'm rambling now, I'll just cut to the chase and ask my question:

  Given a GRASS vector with two attached database tables, one of which
(layer
  2, attaching attributes to centrois) has a key field POLYID, how can I
create
  a new layer using a third, much smaller table, that also has a POLYID
field,
  such that the new layer only contains that subset of centroids where
the
  third layer table's POLYID matches layer 2's POLYID for that centroid?

At some later point I'll worry about the Landmarks point layer and its
associated attributes, after I've figured out what the story is with
connecting the AreaLandmarks to the subset of centroids.

I am convinced that if I truly understood how layers work in GRASS then
my
question's answer would be self-evident, but I also know that "if A then
B"
is a true statement when both A and B are false. I have been completely
unable to piece together a thorough (or even marginal) understanding
from man
pages and even the third edition "Open Source GIS: A GRASS GIS Approach"
book. Any help anyone can give me would be greatly appreciated.

I cannot offer any insight into the GRASS vector/layers system right
now, but there are several good threads from a couple of months ago on
this very topic (vector/layers/confusion).

Good luck,

Dylan

--
Tom Russo KM5VY SAR502 DM64ux
http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1
http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you
get is
  one trick, rational thinking, but when you're good and crazy, oooh,
oooh,
  oooh, the sky is the limit!" --- The Tick
_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

_______________________________________________
grass-user mailing list
grass-user@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-user

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you're good and crazy, oooh, oooh,
oooh, the sky is the limit!" --- The Tick

On Fri, Feb 29, 2008 at 08:45:42AM -0700, we recorded a bogon-computron collision of the <russo@bogodyn.org> flavor, containing:
[line after line of prose detailing problems importing TIGER polygons]

It seems that the only thing to do is a table join, which requires the
use of something other than the DBF driver, as Michael points out (I have
been using sqlite for that reason). Perhaps doing a table join of
AreaLandmarks to PIP using the POLYID key, then selecting with v.extract only
those features that have non-null values in AreaLandmarks attributes will get
me what I want. I will try that next.

Yes, this worked, after a fashion.

Turns out that v.db.join wouldn't work for me --- there is a bug (I just added
it to the tracker) when trying to use that script on a vector with more than
one layer. And even with the bug fixed it doesn't work on the TIGER data
because in fact there are MODULE and FILE and CENID fields that are shared
between tables that conflict with a simple table join. v.db.join complains
angrily when it encounters duplicate fields that aren't the one on which the
join is being done, and exits without doing the job.

HOWEVER, I was able to force things to work in a ham-fisted way, by
cribbing off of the steps that v.db.join is taking --- it's just a
shell script that does a bunch of v.db.addcol and SQL update queries.
Here's how I did it for the sake of completing the record:

#Import the tiger lines and centroids:
v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP type=boundary,centroid output=t35001_all snap=-1

#Import the AreaLandmarks table:
db.in.ogr dsn=~/TIGER/BC_TGR/ db_table=AreaLandmarks output=t35001_AreaLandmarks
#And the landmarks:
db.in.ogr dsn=~/TIGER/BC_TGR/ db_table=Landmarks output=t35001_Landmarks

#Copy the file so I don't have to do that import again when I mess up:
g.copy vect=t35001_all,test35001_all

# Manually join the tables, copying the technique from the v.db.join script:
#First get the column names and types:
COLLIST=`db.describe -c table=t35001_AreaLandmarks | grep '^Column ' | cut -d':' -f2`
COLTYPES="`db.describe -c table=t35001_AreaLandmarks | grep '^Column ' | cut -d':' -f3 | tr -s ' ' '_'`"
#Now do v.db.addcol and SQL UPDATE commands to join the tables, skipping
# common columns we already have in the original table:
for col in $COLLIST
do
  if [ $col != "FILE" -a $col != "MODULE" -a $col != "CENID" -a $col != POLYID ]
  then
    v.db.addcol map=test35001_all layer=2 col="$col `echo $COLTYPES | cut -d' ' -f$i | tr -s '_' ' '`"
    echo "UPDATE test35001_all_2 SET $col=(SELECT $col FROM t35001_AreaLandmarks WHERE test35001_all_2.POLYID=t35001_AreaLandmarks.POLYID)" | db.execute
  fi
  i=`echo $i+1|bc`
done

# Now extract only those areas that have non-null LAND values:
v.extract input=test35001_all layer=2 output=t35001_areal where="LAND IS NOT NULL" type=area

# and join the Landmarks table the same way:
COLLIST=`db.describe -c table=t35001_Landmarks | grep '^Column ' | cut -d':' -f2`
COLTYPES="`db.describe -c table=t35001_Landmarks | grep '^Column ' | cut -d':' -f3 | tr -s ' ' '_'`"
for col in $COLLIST
do
  if [ $col != "FILE" -a $col != "MODULE" -a $col != "LAND"]
  then
    v.db.addcol map=t35001_areal layer=2 col="$col `echo $COLTYPES | cut -d' ' -f$i | tr -s '_' ' '`"
    echo "UPDATE t35001_areal SET $col=(SELECT $col FROM t35001_Landmarks WHERE t35001_areal.LAND=t35001_Landmarks.LAND)" | db.execute
  fi
  i=`echo $i+1|bc`
done

#Display it:
d.vect t35001_areal fcolor=red layer=2
# Query it:
d.what.vect

This shows me only those areas that have landmark attributes, and gives me
the names and CFCC (census feature class codes) for them.

Next step is the comparatively simple act of doing the table join between
the centroids PIP table and the Polygon table in the original vector layer
so that those attributes are attached. To do this I probably want to script
the join in a more general way --- v.db.join won't cut it because it'll
die every time it encounters identical fields between tables that aren't the
ones being merged on.

In fact, to be robust, my script for handling TIGER files should probably
make sure that MODULE, FILE, CENID etc. actually match when merging on POLYID
instead of just assuming so.

The fact that v.db.join actually just does a bunch of v.db.addcol and
"UPDATE table SET column=(SELECT....)" SQL queries in a "for" loop
makes me wonder why it's not legal to use it on DBF files. I'm pretty
sure that both operations are valid on DBF. No SQL JOIN operation is
really being applied.

None of this really helps me understand GRASS layers better --- my
approach to getting the TIGER area landmarks here skips the whole
layer issue and just makes a new vector, thereby copying geometry and
adding tables instead of just re-using it in a clever way. I'd still
like to know how to do that, but I'll take what I've gotten.

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you're good and crazy, oooh, oooh,
oooh, the sky is the limit!" --- The Tick

On Fri, Feb 29, 2008 at 10:28:26AM -0700, we recorded a bogon-computron collision of the <russo@bogodyn.org> flavor, containing:

On Fri, Feb 29, 2008 at 08:45:42AM -0700, we recorded a bogon-computron collision of the <russo@bogodyn.org> flavor, containing:
[line after line of prose detailing problems importing TIGER polygons]

[followed by line after line detailing how I got around it, with a mistake]

Yes, this worked, after a fashion.

[...]

HOWEVER, I was able to force things to work in a ham-fisted way, by
cribbing off of the steps that v.db.join is taking --- it's just a
shell script that does a bunch of v.db.addcol and SQL update queries.
Here's how I did it for the sake of completing the record:

[...]

Right before I posted I had done a bunch of steps that got me what I
wanted, exactly. Then I deleted the vector I created and redid the process
step by step for the sake of my post to make sure I got them all
right. Turns out that there was a mistake somewhere in the attempt to
repeat it, and I don't know what it is. I will have to painstakingly go
through my history file to see where I diverged in my approaches.

The problem comes in the v.extract command:
v.extract input=test35001_all layer=2 output=t35001_areal where="LAND IS NOT NULL" type=area

For some reason, while this DOES select only those features of
test35001_all for which LAND is non-null and put them into
t35001_areal, it oddly does NOT copy the contents of all of the table
columns that we just added. Those columns are there, but they're all
null. So the final d.what.vect on the extracted layer gives none of
the attributes I just went through the trouble of adding. It did the
first time I tried, so I must have made a mistake in the process I
outlined in my prior post.

When I really, truly have it down I'll post again, and shut up until then.

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you're good and crazy, oooh, oooh,
oooh, the sky is the limit!" --- The Tick