[GRASS-dev] Re: [GRASS-user] Re: GRASS-user] Help: Completely confused aboutmulti-layered vectors trying to import TIGER/Line files

Tom,

It sounds to me like you will need to import your basic vector shapes and then use SQL joins to link them up to the various tables you need. Unless there has been a recent update to the contrary (I saw some things on the dev list, but don’t remember the outcome), you cannot do that with the normal DBF files that GRASS uses. You’ll need to make SQLite or PostgreSQL your default, import the TIGER tables into one of these systems and then do the joins.

Michael


Michael Barton, Professor
Professor of Anthropology
Director of Graduate Studies
School of Human Diversity & Social Change
Center for Social Dynamics & Complexity
Arizona State University
Tempe, AZ 85287-2402
USA

voice: 480-965-6262; fax: 480-965-7671
www: http://www.public.asu.edu/~cmbarton

On Feb 28, 2008, at 12:08 PM, Tom Russo wrote:

On Thu, Feb 28, 2008 at 10:38:00AM -0700, we recorded a bogon-computron collision of the <michael.barton@asu.edu> flavor, containing:

On Feb 28, 2008, at 8:57 AM, grass-user-request@lists.osgeo.org wrote:

Date: Thu, 28 Feb 2008 08:39:29 -0700
From: Tom Russo <russo@bogodyn.org>
Subject: [GRASS-user] Help: Completely confused about multi-layered
vectors trying to import TIGER/Line files
To: grass-user@lists.osgeo.org
Message-ID: <20080228153929.GA37583@bogodyn.org>
Content-Type: text/plain; charset=us-ascii

I have been trying to wrap my brain around “multi-layered” GRASS vectors
and
have only succeeded in wrapping my brain into knots. Perhaps someone here
with
a solid understanding of this stuff can help me.

I’m trying to figure out how to import TIGER/Line data and actually get
the
attributes of areas pulled in. This is trouble.

Michael:

Thank you for answering, but your answer has either highlighted how
poorly I expressed my question, or thrown into sharper relief how
confused I am about this. Some of what you say below was already
clear to me, but there’s a big gap between “Each vector file (and
object) can have more than one key field to link it to an attribute
table,” (which I knew), “Each key (AKA ‘cat in layer #’) can link to
a line/record in an attribute table (which also must have an
identical integer key field, that doesn’t HAVE to be called “cat”, but
often is).”(which I also knew), and the thing I really want to know — and
it is the latter that I think I haven’t explained well.

The ‘layers’ you mention here are 2 very different beasts.

First OGR. The underlying concept is that some data (e.g., CAD) come in a
file that has multiple ‘layers’ of vectors that may (or may not) have
different associated data. I don’t know TIGER files, so I don’t know if
they come this way or not.

I’ll clarify, then, because that’s not exactly how TIGER is layed out.
There are a number of vectors, and each is related to one or more
tables of attributes, but OGR doesn’t make the connection itself — there
are simply common attributes between tables that one is left to associate
onesself.

The TIGER data comes in a number of files, each containing a series of
records. Each file has a different record type. There is a record
type that defines nodes in “Complete Chains”, a record type for “shape
points” that define the vertices (between the nodes) of the chains, a
record type for Polygon Internal Points (centroids), a record for
polygon attributes, a record for linking chains to polygons (with
left/right polygon ids) etc.

When unpacked into a directory, OGR views the collection as a set of
“layers” (I HATE that this word is used in so many different ways). A quick
“ogrinfo” shows:

INFO: Open of /users/russo/TIGER/BC_TGR' using driver TIGER’ successful.

Layer name: CompleteChain
Geometry: Line String
Feature Count: 58942
Extent: (-107.196170, 34.869024) - (-106.149575, 35.219639)
Layer SRS WKT: […]
MODULE: String (8.0)
TLID: Integer (10.0) ← This is a Line ID to link to other tables
[… tons more attributes for linear features…]

Layer name: AltName <— table of alternate feature names in addition
to the one in CompleteChain
Geometry: None
Feature Count: 6026
Layer SRS WKT:[…]
MODULE: String (8.0)
TLID: Integer (10.0) <— this one could be used to relate the
alternate names back to linear features
RTSQ: Integer (3.0)
FEAT: IntegerList (8.0) <— and this one links to the next table,
which actually has the names

Layer name: FeatureIds
Geometry: None
Feature Count: 10235
Layer SRS WKT: […]
MODULE: String (8.0)
FILE: Integer (5.0)
FEAT: Integer (8.0) <— linking column for AltName table
FEDIRP: String (2.0)
FENAME: String (30.0)
FETYPE: String (4.0)
FEDIRS: String (2.0)

Layer name: ZipCodes
Geometry: None
Feature Count: 1827
Layer SRS WKT:[…]
MODULE: String (8.0)
TLID: Integer (10.0) <---- links back to CompleteChain
RTSQ: Integer (3.0)
[…]

Layer name: Landmarks
Geometry: Point
Feature Count: 448
Extent: (-107.119811, 34.889113) - (-106.232580, 35.205106)
Layer SRS WKT:
GEOGCS[“NAD83”,
DATUM[“North_American_Datum_1983”,
SPHEROID[“GRS 1980”,6378137,298.257222101]],
PRIMEM[“Greenwich”,0],
UNIT[“degree”,0.0174532925199433]]
MODULE: String (8.0)
FILE: Integer (5.0)
LAND: Integer (10.0) <------ linking column to AreaLandmarks
SOURCE: String (1.0)
CFCC: String (3.0)
LANAME: String (30.0)
LALONG: Integer (10.0)
LALAT: Integer (9.0)
FILLER: String (1.0)

Layer name: AreaLandmarks
Geometry: None
Feature Count: 1292
Layer SRS WKT:
GEOGCS[“NAD83”,
DATUM[“North_American_Datum_1983”,
SPHEROID[“GRS 1980”,6378137,298.257222101]],
PRIMEM[“Greenwich”,0],
UNIT[“degree”,0.0174532925199433]]
MODULE: String (8.0)
FILE: String (5.0)
STATE: Integer (2.0)
COUNTY: Integer (3.0)
CENID: String (5.0)
POLYID: Integer (10.0) <----- Linking column to PIP
LAND: Integer (10.0) <----- Linking column to Landmarks

Layer name: Polygon
Geometry: None
Feature Count: 18597
Layer SRS WKT:
GEOGCS[“NAD83”,
DATUM[“North_American_Datum_1983”,
SPHEROID[“GRS 1980”,6378137,298.257222101]],
PRIMEM[“Greenwich”,0],
UNIT[“degree”,0.0174532925199433]]
MODULE: String (8.0)
FILE: Integer (5.0)
CENID: String (5.0)
POLYID: Integer (10.0) <------ Linking column to PIP
[tons more attributes]

[… a whole lot more “Geometry: none” tables irrelevant to the point…]

Layer name: PIP
Geometry: Point
Feature Count: 18597
Extent: (-107.188495, 34.870089) - (-106.149778, 35.218201)
Layer SRS WKT:
GEOGCS[“NAD83”,
DATUM[“North_American_Datum_1983”,
SPHEROID[“GRS 1980”,6378137,298.257222101]],
PRIMEM[“Greenwich”,0],
UNIT[“degree”,0.0174532925199433]]
MODULE: String (8.0)
FILE: Integer (5.0)
CENID: String (5.0)
POLYID: Integer (10.0) <---- linking column to a bunch of others.
POLYLONG: Integer (10.0)
POLYLAT: Integer (9.0)
WATER: Integer (1.0)

This is an intertwined MESS of data, and none of the intertwining is done
through OGR.

By issuing the original v.in.ogr command:

v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t56015_all
type=boundary,centroid snap=-1

(as taken directly from the v.in.ogr man page) I pulled in the linear
features (CompleteChain, which includes all the boundaris and
non-boundary features) and centroids (PolygonInternalPoint, PIP) with
their associated attributes from their own tables. But as I
mentioned, TIGER is more of a database in normal form, so there are
all sorts of interlinked tables with common keys. v.in.ogr (and OGR
itself) does not follow the links, so it’s up to me to get them linked
up somehow.

Now GRASS layers. A disclaimer from me: I think that “layer” is a confusing
term to use here.

No argument here. I hate that the word “layer” is used in about three
incompatible ways: to denote a vector coverage (as it’s used in most
GIS literature), as one of a set of tables linked to a vector coverage
(in GRASS), and as either a table or a vector element of a collection
of tables and vectors (in OGR).

Each vector file (and
object) can have more than one key field to link it to an attribute table.
These key fields are called “cat” (short for category) and are always
integer. So, a vector can have different integer keys attached to a single
object. But instead of calling these cat1, cat2, etc, they are called ’
cat in layer 1’, ‘cat in layer 2’, etc. Each key (AKA ‘cat in layer #’) can
link to a line/record in an attribute table (which also must have an
identical integer key field, that doesn’t HAVE to be called “cat”, but
often is).

I understand that part. What I am not understanding is how to get the right
categories to attach to the right elements of these extra database columns.

Here’s a concrete example. The TIGER/Line file for this can be
downloaded (sometime before 2 days are up) from this temporary FTP
site: ftp://ftp.swcp.com/pub/tmp/russo/TGR35001.ZIP. The file unzips
to all the various records files, and if unpacked into its own
directory can be imported into a latitude/longitude GRASS location
with the sort of v.in.ogr command I gave above.

This TIGER/Line collection has a table with no associated geometry,
Landmarks, that has an entry (from ogrinfo -al output):

OGRFeature(Landmarks):15
MODULE (String) = TGR35001
FILE (Integer) = 35001
LAND (Integer) = 15
SOURCE (String) = J
CFCC (String) = D10
LANAME (String) = Kirtland Air Force Base
LALONG (Integer) = (null)
LALAT (Integer) = (null)
FILLER (String) = (null)

There are a number of rows in the AreaLandmarks table that relate back to
this single record through the LAND attribute:

OGRFeature(AreaLandmarks):154
MODULE (String) = TGR35001
FILE (String) = 35001
STATE (Integer) = 35
COUNTY (Integer) = 1
CENID (String) = c4588
POLYID (Integer) = 18750
LAND (Integer) = 15

OGRFeature(AreaLandmarks):155
MODULE (String) = TGR35001
FILE (String) = 35001
STATE (Integer) = 35
COUNTY (Integer) = 1
CENID (String) = c4588
POLYID (Integer) = 18749
LAND (Integer) = 15
[lots more]

that relate back to PIP records through the POLYID field. Those PIP records
are:

OGRFeature(PIP):18594
MODULE (String) = TGR35001
FILE (Integer) = 35001
CENID (String) = c4588
POLYID (Integer) = 18750
POLYLONG (Integer) = -106551831
POLYLAT (Integer) = 35060558
WATER (Integer) = (null)
POINT (-106.551831000000007 35.060558)

OGRFeature(PIP):18593
MODULE (String) = TGR35001
FILE (Integer) = 35001
CENID (String) = c4588
POLYID (Integer) = 18749
POLYLONG (Integer) = -106546870
POLYLAT (Integer) = 35049120
WATER (Integer) = (null)
POINT (-106.546869999999998 35.049120000000002)

[etc.]

and these PIP records are properly attached to centroids in my GRASS vector:

v.info -c layer=2 map=t35001_all
Displaying column types/names for database connection of layer 2:
INTEGER|cat
TEXT|MODULE
INTEGER|FILE
TEXT|CENID
INTEGER|POLYID
INTEGER|POLYLONG
INTEGER|POLYLAT
INTEGER|WATER

so somewhere there is a centroid with some category number that has
POLYID 18749, which ultimately could be associated with AreaLandmark
feature 155 and thence (through LAND attribute 15) to Landmark feature 15 and
the name “Kirtland Air Force Base”

What I want to accomplish is to produce something that I can display
and query that represents the collection of AreaLandmarks, which is a
subset of the areas initially imported. I should be able to do a
“d.vect somevector layer=somelayer” and see only those polygons that
have AreaLandmarks attributes, and be able to use d.what.vect to click
on those polygons and get the attributes (presumably I’d do a table
join between the AreaLandmarks table and Landmarks table so that
things like the landmark’s name and feature type are all in one table
not two).

My assumption is that the key concept I am missing is that there must
be a way to select, based on records of AreaLandmarks, a subset of
vector elements from the full imported collection of areas (whose
POLYID attribute is already stored in the table attached to Layer 2 of the
vector), assign them new categories for a layer 3, relate those new
categories to rows of the AreaLandmarks table, and finally attach the
AreaLandmarks table to the new layer through its category values.

So my question is how do I do that?

I imagine there’s some way to do an extraction with v.extract and a
where clause to create a vector of only those areas with POLID
attributes that appear in the AreaLandmarks table… I hadn’t thought
about that yet. I’m not sure I can craft the WHERE clause for
v.extract that would reference a table that isn’t attached to the
vector yet, though.

However, once you get the data into GRASS, it is
possible to “upload” data from one attribute table (linked to layer 2,
for example) into another attribute table (linked to layer 1, for
example).

I’m sure it’s possible, but I still don’t understand how to do it in this case.


Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
“And, isn’t sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you’re good and crazy, oooh, oooh,
oooh, the sky is the limit!” — The Tick