[GRASS5] Re: Vector format proposal

aaime wrote:

Hi everybody,
this evening I've found time to read GRASS 5.1 vector format proposal. Sounds
good, but I disagree with attributes management proposal. I this mail will
try to explain why.
If I undestrand correctly, you proposal is to let different entities to have
a different number of attributes, and that each entity can be linked to
different databases. That sound very flexible, but also very complicated to
handle in the long run.

Different entities have different tables because they _are_ different.
We could even think of multiple layers in one map, and so obviously
there, the vector entities would have completely different properties.
This is not currently a proposal, but clearly lines or sites will not
usually be the same as areas. Though as I understand it, two types of
entities _could_ be linked to the same table.

I don't question againt complexity per se, complex problems often have a
complex solution, but in this case I don't see the need for such a solution
(in the problem domain, I mean). What are the requirements that have led you
think that such a flexiblity is necessary?

An example - we can imagine a map where a river network (lines) often
provide the boundary of a polygon layer, eg. vegetation cover types
(area). Dgiitising you might want to align an area boundary with an line
already there, between two points. Creating the new boundary would
simply mean copying part of the existing line, but storing the copy as a
BOUNDARY rather than a LINE. That would need a new function in v.digit -
but it needs a lot of new stuff anyway.

In my opinion it may stem from the fact that a vector file can store areas,
points and lines all togheter. I don't like much this, I would prefer to
store different entities in different files, so that we now that a single
file contains only areas, as an example. This would make easier to implement
overlay commands and to make more strict control on topology creation (eg: no
self intersecting lines, no intersections between different lines).

Normally a vector file does indeed store just one principal type of
entity: area files would contain BOUNDARY (now CENTROID also), line
files just LINE and site files just SITE (formerly a type of (LINE). But
they can contain various types. It is just a convention that you have
one type only in a file. BTW this is also the case with shapefiles and
Mapinfo files.

About line crossings. LINE features can cross in a network, so line maps
might allow crossing, or not, and should be able to build in either
case. Area maps must conform to 2-dimensional manifold (or later -
surface manifold) conditions, so boundaries don't cross.

In my previuous proposal I said that a simple index is sufficient to link
entities to a table, that can contain as many attributes as the user wants,
but the same number and the same type for each entity in a single file.
This is a quite different approach, but I think it may be good for three
reasons:
* it's simple, and simple software is easier to build and mantain, and more
stable;

But it is also simple in a way to have just one type of file, and have
each confine itself to a particular type by convention. However you may
have a point about just one link (category index) per item [any thoughts
Radim?]

* you always know to which table a file is linked, no need to interpret an
association file like the one proposed;

I don't agree. Maintaining links by internal logic is better than
relying on system level resources (like file names or filesystem
locations). This is similar in my view to the way current commercial GIS
apps bind vector handling functions too much to the properties of the
graphical display system. Really, though the user interface can provide
many useful features, it should be a `dumb terminal'. All analytic work
should be done within an engine that is dedicated to vector map
processing. The way X or win32 functions should never contaminate the
way vector data is processed or stored. And this applies to the
attribute layer as well.

* because everybody else is doing like this (I'm thinking about Autodesk,
ESRI and Intergraph), so it's a proven road and it's familiar to the user of
commercial packages.

I can't see this is relevant: the user won't see what is in the
background. Anyway if you look at the Arc/Info binary file structure, or
its export equivalent, this is complex in file structure compared to
GRASS, and this is the only other fully topologically structured vector
engine that is commonly used and well documented.

Many aspects of commercial apps are not worth emulating, I would say
they are _disproven_. Digitising with ArcView for example involves
manipulating a geometric database by means of polygon operations through
callbacks from a vector based visual display. My long experience of this
came happily to an end when I started using GRASS, because its
digitising capabilities are so much more robust. Not necessarily faster
- but how much time do you spend correcting GRASS vector databases
compared to geometric forms originating from shapefiles or Mapinfo?

BTW - the vector engine in these apps may not be simple. Remember that
as polygons are processed, a whole series of checks and corrections have
to be made on the fly, and I know how difficult this can be as I have
spent much of the last year trying to find ways of correcting such maps.
Small errors in such maps tend to generate more errors when polygon ops
are applied, and so the whole thing is intrinsically unstable, and
errors can go a long time without being detected (because the format is
not topologically aware). You don't get any of this with GRASS, which
produces stable, immutable and persistent data because of the way its
linework is processed and stored.

So if we are going to try to create a vector engine for display, editing
and storing vector data that emulates the properties of ArcView (say),
these are the problems we are likely to run into.

David

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Well, here's my two cents on data models.

I fully agree with David regarding graphic vs. topological
representations of "entities". I've seen plenty of shapefiles with
screwed up geometry to know this is a real problem. But, I'll grant the
topological "coverage" concept is a bit more complex to manage vs. the
graphic representation.

In regards to attribute tables: I would think a one to one mapping from
each map entity to an attribute table via a unique integer key is the
best solution. So, for each AREA, LINE, SITE, LABEL, NODE, etc there is
a table. It might be, we want to maintain some separation between
internal numberings and what the user uses for a key. Otherwise we
can't change a numbering after an edit/rebuild cycle because the linkage
with the attributes will be broken. A simple two integer table is all
that is needed, the user is only presented with setting the "user_id"
whereas the other is maintained by the system. Example:

AREA_ID | USER_ID
      1 | 1
      2 | 1
      3 | 2
      4 | 1
...

So, when v.digit asks for a "category number" it sets the "USER_ID" with
whatever the user wants, but the "AREA_ID" is incremented as the next
largest value (for that table). A rebuild might renumber the "AREA_ID"
to make them sequential, but the "USER_ID" is never touched. Thus,
linkage to external tables always works. I think I'm getting redundant
here :wink:

--
Eric G. Miller <egm2@jps.net>

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

Eric G. Miller wrote:

Well, here's my two cents on data models.

I fully agree with David regarding graphic vs. topological
representations of "entities". I've seen plenty of shapefiles with
screwed up geometry to know this is a real problem. But, I'll grant the
topological "coverage" concept is a bit more complex to manage vs. the
graphic representation.

In regards to attribute tables: I would think a one to one mapping from
each map entity to an attribute table via a unique integer key is the
best solution. So, for each AREA, LINE, SITE, LABEL, NODE, etc there is
a table. It might be, we want to maintain some separation between
internal numberings and what the user uses for a key. Otherwise we
can't change a numbering after an edit/rebuild cycle because the linkage
with the attributes will be broken. A simple two integer table is all
that is needed, the user is only presented with setting the "user_id"
whereas the other is maintained by the system. Example:

AREA_ID | USER_ID
      1 | 1
      2 | 1
      3 | 2
      4 | 1
...

So, when v.digit asks for a "category number" it sets the "USER_ID" with
whatever the user wants, but the "AREA_ID" is incremented as the next
largest value (for that table). A rebuild might renumber the "AREA_ID"
to make them sequential, but the "USER_ID" is never touched. Thus,
linkage to external tables always works. I think I'm getting redundant
here :wink:

USER_ID is category in grass and AREA_ID is sequential number assigned
during build process. Link to table may not be broken. Lib is written in this
way.

Radim

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'

On Tue, May 15, 2001 at 08:48:27AM +0200, Radim Blazek wrote:

USER_ID is category in grass and AREA_ID is sequential number assigned
during build process. Link to table may not be broken. Lib is written in this
way.

Okay,
   Forgive me if I'm being dense. But, how do you join an external table
to select area(s) with category "2" when there are also lines and points
with category "2". Do we join all then filter out the lines and points?

So we have something like:

1 : 2
2 : 2
3 : 2

Eric G. Miller wrote:

On Tue, May 15, 2001 at 08:48:27AM +0200, Radim Blazek wrote:
> USER_ID is category in grass and AREA_ID is sequential number assigned
> during build process. Link to table may not be broken. Lib is written in
> this way.

Okay,
   Forgive me if I'm being dense. But, how do you join an external table
to select area(s) with category "2" when there are also lines and points
with category "2". Do we join all then filter out the lines and points?

So we have something like:

1 : 2
2 : 2
3 : 2

Yes, that was my idea and that is how grass50 works
(v.reclass has type= option for type selection).

As I mentioned in some other mail, it may be useful to have more types
(lines, points, areas) linked to one table, so I would prefer don't
distinguish types in db connection.

I see following schemes:
(according to vector proposal I will use 'field' for distinguishing
categories)
- all elements (various types) are the same logical feaure (water for
   example: lakes, rivers, spring - area, line, point with many attributes
   for water quality), all categories are of one field (1 for example)
   and linked to the same table.
   We can work with all types at the same time or select one type.
- map contains just one type
- map contains more features each of one type (for example lakes as
   areas with field=1 and rivers - lines with field=2), each feature linked
   to its own table. The selection is done by field= option

Note: using field number is not user friendly and I expect some place
        where names will be assigned to fields so that running module
        would look like:
v.xx input=wat output=riv1 field=rivers where="q > 10 and n > 0.03"

Radim

----------------------------------------
If you want to unsubscribe from GRASS Development Team mailing list write to:
minordomo@geog.uni-hannover.de with
subject 'unsubscribe grass5'