[GRASS-user] Re: GRASS-user] Help: Completely confused about multi-layered vectors trying to import TIGER/Line files

On Feb 28, 2008, at 8:57 AM, grass-user-request@lists.osgeo.org wrote:

Date: Thu, 28 Feb 2008 08:39:29 -0700
From: Tom Russo <russo@bogodyn.org>
Subject: [GRASS-user] Help: Completely confused about multi-layered
  vectors trying to import TIGER/Line files
To: grass-user@lists.osgeo.org
Message-ID: <20080228153929.GA37583@bogodyn.org>
Content-Type: text/plain; charset=us-ascii

I have been trying to wrap my brain around "multi-layered" GRASS vectors and
have only succeeded in wrapping my brain into knots. Perhaps someone here with
a solid understanding of this stuff can help me.

I'm trying to figure out how to import TIGER/Line data and actually get the
attributes of areas pulled in. This is trouble.

The v.in.ogr documentation has an example of how to do it:

v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t35001_all \
                    type=boundary,centroid snap=-1

which does indeed import the CompleteChain layer and PIP (Polygon Internal
Point) layers --- it puts the boundaries in layer 1 and the centroids in
layer 2, and if I do a
d.vect t35001_all layer=2
I can see the areas just fine.

Tom,

I'll focus on the first part of your question in the hopes that it will clarify the rest of it.

The 'layers' you mention here are 2 very different beasts.

First OGR. The underlying concept is that some data (e.g., CAD) come in a file that has multiple 'layers' of vectors that may (or may not) have different associated data. I don't know TIGER files, so I don't know if they come this way or not. However, OGR tends to treat all vector data in this way. So, for example, if you have a shapefile, it treats the DIRECTORY that holds the shapefile as the 'dsn' and the shapefile (minus the .shp) as the data 'layer'. In older versions of OGR, that was the only way to import stuff. Now, you can just put the shapefile (with .shp) in the dsn field and OGR will work it out. But the concept holds. So in this case, you need to know which 'layer' you want to import from your TIGER data, whether they come in a multi-layered file or as multiple files. How the layer is linked with data is another matter and can involve GRASS 'layers'.

Now GRASS layers. A disclaimer from me: I think that "layer" is a confusing term to use here. I think that "key" or "key field" or something along that line would be much more understandable for people accustomed to database terminology. In fact, that is what GRASS layers are. Each vector file (and object) can have more than one key field to link it to an attribute table. These key fields are called "cat" (short for category) and are always integer. So, a vector can have different integer keys attached to a single object. But instead of calling these cat1, cat2, etc, they are called '
cat in layer 1', 'cat in layer 2', etc. Each key (AKA 'cat in layer #') can link to a line/record in an attribute table (which also must have an identical integer key field, that doesn't HAVE to be called "cat", but often is).

The result for a hypothetical TIGER file of census blocks is that a block area (=polygon) can be linked to one attribute table via the key (="cat") in layer 1, a second attribute table via a key in layer 2, and so on. If there are multiple attribute tables linked to the layers in your TIGER file, OGR will try to put each table into a separate GRASS attribute table and link the proper record of each table to its associated vector object with a key in a GRASS layer for each table. I don't know how OGR parses the vector objects and attributes of a TIGER file internally. However, once you get the data into GRASS, it is possible to "upload" data from one attribute table (linked to layer 2, for example) into another attribute table (linked to layer 1, for example).

I hope this is helpful.

Michael

On Thu, Feb 28, 2008 at 10:38:00AM -0700, we recorded a bogon-computron collision of the <michael.barton@asu.edu> flavor, containing:

On Feb 28, 2008, at 8:57 AM, grass-user-request@lists.osgeo.org wrote:

Date: Thu, 28 Feb 2008 08:39:29 -0700
From: Tom Russo <russo@bogodyn.org>
Subject: [GRASS-user] Help: Completely confused about multi-layered
  vectors trying to import TIGER/Line files
To: grass-user@lists.osgeo.org
Message-ID: <20080228153929.GA37583@bogodyn.org>
Content-Type: text/plain; charset=us-ascii

I have been trying to wrap my brain around "multi-layered" GRASS vectors
and
have only succeeded in wrapping my brain into knots. Perhaps someone here
with
a solid understanding of this stuff can help me.

I'm trying to figure out how to import TIGER/Line data and actually get
the
attributes of areas pulled in. This is trouble.

Michael:

Thank you for answering, but your answer has either highlighted how
poorly I expressed my question, or thrown into sharper relief how
confused I am about this. Some of what you say below was already
clear to me, but there's a big gap between "Each vector file (and
object) can have more than one key field to link it to an attribute
table," (which I knew), "Each key (AKA 'cat in layer #') can link to
a line/record in an attribute table (which also must have an
identical integer key field, that doesn't HAVE to be called "cat", but
often is)."(which I also knew), and the thing I really want to know --- and
it is the latter that I think I haven't explained well.

The 'layers' you mention here are 2 very different beasts.

First OGR. The underlying concept is that some data (e.g., CAD) come in a
file that has multiple 'layers' of vectors that may (or may not) have
different associated data. I don't know TIGER files, so I don't know if
they come this way or not.

I'll clarify, then, because that's not exactly how TIGER is layed out.
There are a number of vectors, and each is related to one or more
tables of attributes, but OGR doesn't make the connection itself --- there
are simply common attributes between tables that one is left to associate
onesself.

The TIGER data comes in a number of files, each containing a series of
records. Each file has a different record type. There is a record
type that defines nodes in "Complete Chains", a record type for "shape
points" that define the vertices (between the nodes) of the chains, a
record type for Polygon Internal Points (centroids), a record for
polygon attributes, a record for linking chains to polygons (with
left/right polygon ids) etc.

When unpacked into a directory, OGR views the collection as a set of
"layers" (I HATE that this word is used in so many different ways). A quick
"ogrinfo" shows:

INFO: Open of `/users/russo/TIGER/BC_TGR'
      using driver `TIGER' successful.

Layer name: CompleteChain
Geometry: Line String
Feature Count: 58942
Extent: (-107.196170, 34.869024) - (-106.149575, 35.219639)
Layer SRS WKT: [...]
MODULE: String (8.0)
TLID: Integer (10.0) <- This is a Line ID to link to other tables
[... tons more attributes for linear features...]

Layer name: AltName <--- table of alternate feature names in addition
                                 to the one in CompleteChain
Geometry: None
Feature Count: 6026
Layer SRS WKT:[...]
MODULE: String (8.0)
TLID: Integer (10.0) <--- this one could be used to relate the
                                   alternate names back to linear features
RTSQ: Integer (3.0)
FEAT: IntegerList (8.0) <--- and this one links to the next table,
                                   which actually has the names

Layer name: FeatureIds
Geometry: None
Feature Count: 10235
Layer SRS WKT: [...]
MODULE: String (8.0)
FILE: Integer (5.0)
FEAT: Integer (8.0) <--- linking column for AltName table
FEDIRP: String (2.0)
FENAME: String (30.0)
FETYPE: String (4.0)
FEDIRS: String (2.0)

Layer name: ZipCodes
Geometry: None
Feature Count: 1827
Layer SRS WKT:[...]
MODULE: String (8.0)
TLID: Integer (10.0) <---- links back to CompleteChain
RTSQ: Integer (3.0)
[...]

Layer name: Landmarks
Geometry: Point
Feature Count: 448
Extent: (-107.119811, 34.889113) - (-106.232580, 35.205106)
Layer SRS WKT:
GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433]]
MODULE: String (8.0)
FILE: Integer (5.0)
LAND: Integer (10.0) <------ linking column to AreaLandmarks
SOURCE: String (1.0)
CFCC: String (3.0)
LANAME: String (30.0)
LALONG: Integer (10.0)
LALAT: Integer (9.0)
FILLER: String (1.0)

Layer name: AreaLandmarks
Geometry: None
Feature Count: 1292
Layer SRS WKT:
GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433]]
MODULE: String (8.0)
FILE: String (5.0)
STATE: Integer (2.0)
COUNTY: Integer (3.0)
CENID: String (5.0)
POLYID: Integer (10.0) <----- Linking column to PIP
LAND: Integer (10.0) <----- Linking column to Landmarks

Layer name: Polygon
Geometry: None
Feature Count: 18597
Layer SRS WKT:
GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433]]
MODULE: String (8.0)
FILE: Integer (5.0)
CENID: String (5.0)
POLYID: Integer (10.0) <------ Linking column to PIP
[tons more attributes]

[... a whole lot more "Geometry: none" tables irrelevant to the point...]

Layer name: PIP
Geometry: Point
Feature Count: 18597
Extent: (-107.188495, 34.870089) - (-106.149778, 35.218201)
Layer SRS WKT:
GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433]]
MODULE: String (8.0)
FILE: Integer (5.0)
CENID: String (5.0)
POLYID: Integer (10.0) <---- linking column to a bunch of others.
POLYLONG: Integer (10.0)
POLYLAT: Integer (9.0)
WATER: Integer (1.0)

This is an intertwined MESS of data, and none of the intertwining is done
through OGR.

By issuing the original v.in.ogr command:

  v.in.ogr dsn=~/TIGER/BC_TGR layer=CompleteChain,PIP output=t56015_all \
                     type=boundary,centroid snap=-1

(as taken directly from the v.in.ogr man page) I pulled in the linear
features (CompleteChain, which includes all the boundaris and
non-boundary features) and centroids (PolygonInternalPoint, PIP) with
their associated attributes *from their own tables*. But as I
mentioned, TIGER is more of a database in normal form, so there are
all sorts of interlinked tables with common keys. v.in.ogr (and OGR
itself) does not follow the links, so it's up to me to get them linked
up somehow.

Now GRASS layers. A disclaimer from me: I think that "layer" is a confusing
term to use here.

No argument here. I hate that the word "layer" is used in about three
incompatible ways: to denote a vector coverage (as it's used in most
GIS literature), as one of a set of tables linked to a vector coverage
(in GRASS), and as either a table or a vector element of a collection
of tables and vectors (in OGR).

Each vector file (and
object) can have more than one key field to link it to an attribute table.
These key fields are called "cat" (short for category) and are always
integer. So, a vector can have different integer keys attached to a single
object. But instead of calling these cat1, cat2, etc, they are called '
cat in layer 1', 'cat in layer 2', etc. Each key (AKA 'cat in layer #') can
link to a line/record in an attribute table (which also must have an
identical integer key field, that doesn't HAVE to be called "cat", but
often is).

I understand that part. What I am not understanding is how to get the right
categories to attach to the right elements of these extra database columns.

Here's a concrete example. The TIGER/Line file for this can be
downloaded (sometime before 2 days are up) from this temporary FTP
site: ftp://ftp.swcp.com/pub/tmp/russo/TGR35001.ZIP. The file unzips
to all the various records files, and if unpacked into its own
directory can be imported into a latitude/longitude GRASS location
with the sort of v.in.ogr command I gave above.

This TIGER/Line collection has a table with no associated geometry,
Landmarks, that has an entry (from ogrinfo -al output):

OGRFeature(Landmarks):15
  MODULE (String) = TGR35001
  FILE (Integer) = 35001
  LAND (Integer) = 15
  SOURCE (String) = J
  CFCC (String) = D10
  LANAME (String) = Kirtland Air Force Base
  LALONG (Integer) = (null)
  LALAT (Integer) = (null)
  FILLER (String) = (null)

There are a number of rows in the AreaLandmarks table that relate back to
this single record through the LAND attribute:

OGRFeature(AreaLandmarks):154
  MODULE (String) = TGR35001
  FILE (String) = 35001
  STATE (Integer) = 35
  COUNTY (Integer) = 1
  CENID (String) = c4588
  POLYID (Integer) = 18750
  LAND (Integer) = 15

OGRFeature(AreaLandmarks):155
  MODULE (String) = TGR35001
  FILE (String) = 35001
  STATE (Integer) = 35
  COUNTY (Integer) = 1
  CENID (String) = c4588
  POLYID (Integer) = 18749
  LAND (Integer) = 15
[lots more]

that relate back to PIP records through the POLYID field. Those PIP records
are:

OGRFeature(PIP):18594
  MODULE (String) = TGR35001
  FILE (Integer) = 35001
  CENID (String) = c4588
  POLYID (Integer) = 18750
  POLYLONG (Integer) = -106551831
  POLYLAT (Integer) = 35060558
  WATER (Integer) = (null)
  POINT (-106.551831000000007 35.060558)

OGRFeature(PIP):18593
  MODULE (String) = TGR35001
  FILE (Integer) = 35001
  CENID (String) = c4588
  POLYID (Integer) = 18749
  POLYLONG (Integer) = -106546870
  POLYLAT (Integer) = 35049120
  WATER (Integer) = (null)
  POINT (-106.546869999999998 35.049120000000002)

[etc.]

and these PIP records are properly attached to centroids in my GRASS vector:

> v.info -c layer=2 map=t35001_all
Displaying column types/names for database connection of layer 2:
INTEGER|cat
TEXT|MODULE
INTEGER|FILE
TEXT|CENID
INTEGER|POLYID
INTEGER|POLYLONG
INTEGER|POLYLAT
INTEGER|WATER

so somewhere there is a centroid with some category number that has
POLYID 18749, which ultimately could be associated with AreaLandmark
feature 155 and thence (through LAND attribute 15) to Landmark feature 15 and
the name "Kirtland Air Force Base"

What I *want* to accomplish is to produce something that I can display
and query that represents the collection of AreaLandmarks, which is a
subset of the areas initially imported. I should be able to do a
"d.vect somevector layer=somelayer" and see only those polygons that
have AreaLandmarks attributes, and be able to use d.what.vect to click
on those polygons and get the attributes (presumably I'd do a table
join between the AreaLandmarks table and Landmarks table so that
things like the landmark's name and feature type are all in one table
not two).

My assumption is that the key concept I am missing is that there must
be a way to select, based on records of AreaLandmarks, a subset of
vector elements from the full imported collection of areas (whose
POLYID attribute is already stored in the table attached to Layer 2 of the
vector), assign them new categories for a layer 3, relate those new
categories to rows of the AreaLandmarks table, and finally attach the
AreaLandmarks table to the new layer through its category values.

So my question is how do I do that?

I imagine there's some way to do an extraction with v.extract and a
where clause to create a vector of only those areas with POLID
attributes that appear in the AreaLandmarks table... I hadn't thought
about that yet. I'm not sure I can craft the WHERE clause for
v.extract that would reference a table that isn't attached to the
vector yet, though.

However, once you get the data into GRASS, it is
possible to "upload" data from one attribute table (linked to layer 2,
for example) into another attribute table (linked to layer 1, for
example).

I'm sure it's possible, but I still don't understand how to do it in this case.

--
Tom Russo KM5VY SAR502 DM64ux http://www.swcp.com/~russo/
Tijeras, NM QRPL#1592 K2#398 SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
one trick, rational thinking, but when you're good and crazy, oooh, oooh,
oooh, the sky is the limit!" --- The Tick