[GRASS-dev] Re: [tlaronde: ticket #542: vector for GRASS 7]

Forwarded to the list upon request from T Laronde.

On Fri, Dec 4, 2009 at 3:27 PM, <tlaronde@polynum.com> wrote:

Hello,

I stumbled upon the "new ideas" for vector of GRASS 7 and wanted to give
hints. I'm not subscribed to grass-dev (and don't want to) so I'm not
sure if the message will be accepted or not by the program managing the
list.

FWIW, here are my comments. You can do whatever you feel with them.

Since it is clear that the ones commenting---and they say it...---don't
understand the vectorial stuff and are even not developers, it will be
IMO a very bad idea to let people adding at haphazard mess in this
area...

Cheers,
--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C

---------- Forwarded message ----------
From: tlaronde
To: grass-dev@lists.osgeo.org
Date: Fri, 4 Dec 2009 15:15:27 +0100
Subject: ticket #542: vector for GRASS 7
[Please CC me on replies, since I'm not subscribed.]

FWIW, having given a look at GPL GRASS plans for 7.0, I stumbled upon
ticket #542 about "new ideas" for vector.

May I suggest: don't do that! It seems that, since Radim Blazek
leaved, no one has really grasped the vector "philosophy"---at least if
you want GRASS to continue to be topological.

IMO, the first thing you should do is to create a lexicon, that is a
list of normalised words for GRASS and their meaning; this will avoid
the confusion between "lines" and "arcs" for example. Particularly,
when, for GRASS, two things are clearly distinct, even if in current
speaking there are fuzzy, you must impose two distinct words. Example:
arcs are the primitives of the vector definition (level 0). "Lines" are
geometrical objects deduced from primitives, as are "points", "faces"
and, perhaps, "volumes". Nodes are not geometrical objects : they are
topological creations. Centroïds are not geometrical objects neither, but
just a topological mean to _store_ attribute information : this is
attributed to a _geometrical_ object (hence: a "point", a "line", a
"face" or a "volume") if the point is equal to a point, along a line, on
a face or inside a volume. Note: I use special words, distinct words for
the topological relations so that in the _code_ saying "equal" means
point; saying "along" means line; saying "on" means face and so on.
Always distinct words for distinct meanings, and when possible, the
first letter, or the two first letters are sufficient to tell the
complete word.

The level 0 is just the storage of the _arcs_. Even geometrical elements
"points" are stored as _arcs_ (two vertices identical).

The level 0 is needed for the definition of the primitives and for the
construction of the topology. It is not needed for a huge part of the
applications when you do not need _metrics_. Topology gives _order_ ;
coordinate system, definitions, arcs give you _metrics_. So improving
the reading of the _arcs_ is only needed from time to time.

The topology gives you direct access (offsets) to the metrics if needed.
So there is already information to speed things.

May I say that the main problems are not to "improve" your current (new)
vector handling, but to clean the problems introduced. Specifically,
merging the Sites as Points in the vector was, IMO, an error. There is a
need for the grid (also called: raster but this is a bad name),
singularities (Sites) and topological vector (arcs).

Secondly, the introduction of 3D in vector is not perhaps a great idea
if you remember what a GIS is mainly for. For the coordinates systems,
there is a surface of reference: a geoïde, simplified as an ellipsoïde.
The grid is a mean, along with a connexion of Sites to described a more
complexe surface relative to the geoïde (via the ellipsoïde).
Generally, in cartography, the vector lines are drawn on the surface.
So the 3D is given by the grid or the triangulation of Sites
underneath. Imagine a corrugated iron as a surface. You can draw
two "straight lines" (view from above) ; view from the side, they
are not straight at all. But since they are linked to the surface,
you can describe them by a starting and an ending vertice; you do
not need, at every moment, to drag a whole Bezier description. Only
when you need the 3D, you combine the surface with the straight
lines. And if you change the description of the surface, you do
not need to change the description of the vector: it is "just"
drawn on the surface linked, whatever it is. You need 3D only for
metrics; and if the vectors objects are not on the same surface,
you describe them in distinct layers.

Just to give you the incentive to think twice about the vectorial stuff.
It is really an interesting part, and the CERL GRASS strength is the
topological stuff. Take the time to understand this deeply first, before
making "something" just because "new" is an advertising word.

HTH
--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C

thanks for providing your comments Thierry.
I agree with a fair amount of what you say, specifically education is
more work but far better than reinvention (poorly). there certainly is
a lot of hidden wisdom in the grass code for us to learn from.

a few minor points:

avoid the confusion between "lines" and "arcs" for example.

please, "polyline" is much better than arc. for one thing the name-
space is well and truly taken, for another it's the term the rendering
library uses, and for a last point from a mathematical standpoint an
arc to me is a pure curve not a broken line approximating a curved line.
(arguing semantics will always be a chore of picking the best from
among imperfect terms)

merging the Sites as Points in the vector was, IMO, an error.

both have their strengths and their weaknesses, but regardless of that
it is done now and with minor extra-topological ugliness we can store
massive point datasets in the current framework. it's not ideal, and
we've lost users to eg PostGIS because of it, but this is not the
trickiest problem we face so we'll see what future solutions introduce
themselves.

Secondly, the introduction of 3D in vector is not perhaps a great idea
if you remember what a GIS is mainly for.

important: s/is/has been/
don't limit yourself by what others have done in the past.
to me the interesting thing about grass is not what it can do, but
what it could easily be extended to do.

I work in the water which is inherently a 3D environment, I think I can
speak for all our geologic, atmospheric, archaeological, and so on
refugees from other GIS when I say that the 3D support is a big reason
we use GRASS.

to me, the biggest software gap we have from a vector perspective besdies
the LiDAR problem is to get a mature non-encumbered reimplimentation of
the Triangle library out there to the world. (see nnbathy threads)

regards,
Hamish

Thierry,

thanks for your feedback - I think Hamish has explained it well, so I just concur what he has to say:

On Dec 5, 2009, at 5:25 AM, Hamish wrote:

thanks for providing your comments Thierry.
I agree with a fair amount of what you say, specifically education is
more work but far better than reinvention (poorly). there certainly is
a lot of hidden wisdom in the grass code for us to learn from.

a few minor points:

avoid the confusion between "lines" and "arcs" for example.

please, "polyline" is much better than arc. for one thing the name-
space is well and truly taken, for another it's the term the rendering
library uses, and for a last point from a mathematical standpoint an
arc to me is a pure curve not a broken line approximating a curved line.
(arguing semantics will always be a chore of picking the best from
among imperfect terms)

merging the Sites as Points in the vector was, IMO, an error.

it was not done the way it should have been done, but the old sites
format would have been difficult to maintain and expand.

both have their strengths and their weaknesses, but regardless of that
it is done now and with minor extra-topological ugliness we can store
massive point datasets in the current framework. it's not ideal, and
we've lost users to eg PostGIS because of it, but this is not the
trickiest problem we face so we'll see what future solutions introduce
themselves.

Secondly, the introduction of 3D in vector is not perhaps a great idea
if you remember what a GIS is mainly for.

important: s/is/has been/
don't limit yourself by what others have done in the past.
to me the interesting thing about grass is not what it can do, but
what it could easily be extended to do.

I work in the water which is inherently a 3D environment, I think I can
speak for all our geologic, atmospheric, archaeological, and so on
refugees from other GIS when I say that the 3D support is a big reason
we use GRASS.

same here - and it is not just the 3D points, we use 3D vector for buildings,
roads, rivers - it has been very useful. GRASS has been used for 3D even 4D
modeling for years,

Helena

to me, the biggest software gap we have from a vector perspective besdies
the LiDAR problem is to get a mature non-encumbered reimplimentation of
the Triangle library out there to the world. (see nnbathy threads)

regards,
Hamish

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Markus Neteler wrote:

Forwarded to the list upon request from T Laronde.

On Fri, Dec 4, 2009 at 3:27 PM, <tlaronde@polynum.com> wrote:
  

IMO, the first thing you should do is to create a lexicon, that is a
list of normalised words for GRASS and their meaning; this will avoid
the confusion between "lines" and "arcs" for example. Particularly,
when, for GRASS, two things are clearly distinct, even if in current
speaking there are fuzzy, you must impose two distinct words. Example:
arcs are the primitives of the vector definition (level 0). "Lines" are
geometrical objects deduced from primitives, as are "points", "faces"
and, perhaps, "volumes". Nodes are not geometrical objects : they are
topological creations. Centroïds are not geometrical objects neither, but
just a topological mean to _store_ attribute information : this is
attributed to a _geometrical_ object (hence: a "point", a "line", a
"face" or a "volume") if the point is equal to a point, along a line, on
a face or inside a volume. Note: I use special words, distinct words for
the topological relations so that in the _code_ saying "equal" means
point; saying "along" means line; saying "on" means face and so on.
Always distinct words for distinct meanings, and when possible, the
first letter, or the two first letters are sufficient to tell the
complete word.
    

Regarding the creation of a lexicon, I too think that this would help both developers and users to better understand the grass vector architecture. The general objective of the proposed changes is to more clearly distinguish between different types of geometry features present in a grass vector. That starts with distinguishing between primitive features that are always present and features that only exist in topology.

In grass, primitive features that are always present are points, lines, boundaries, centroids, faces, kernels. These primitives could also be called arcs or polylines. Hamish argued against the term arcs, I would argue against the term polylines, because points, centroids and kernels are by no means polylines, they have only one vertex. Also a line consisting of two vertices only is not a polyline in the strict sense. Not felling too strong about it, it's just that "lines" as term for all the stuff in the coor file is a bit confusing because a line can be a point, line, boundary, centroid, face, or kernel.

Derived features only present in topology, i.e. not available when opening a grass vector on level 1, are nodes, areas, isles, volumes.

The description of centroids given by Thierry sounds much more like the description of categories. In grass, categories, not centroids, are used to store attribute information. Only primitives (points, lines, boundaries, centroids, faces, kernels) can have one or more, also shared, categories, derived features do not have categories. Thus areas also do not have categories; in this case attributes are linked to a centroid which in turn is linked to an area. Centroids are not topological creations, they are also available on level 1.

The level 0 is just the storage of the _arcs_. Even geometrical elements
"points" are stored as _arcs_ (two vertices identical).
    

In grass, points, centroids and kernels are stored with one vertex only. BTW, level 0 in grass means "try to open on level 2 (with topology), if that fails, issue a warning and open on level 1 (no topology)", IOW level 0 in grass is the highest level supported by the given vector. Even if other vector topology implementations use different terminology, I would prefer to use the current grass terminology in this discussion to avoid confusion.

May I say that the main problems are not to "improve" your current (new)
vector handling, but to clean the problems introduced. Specifically,
merging the Sites as Points in the vector was, IMO, an error. There is a
need for the grid (also called: raster but this is a bad name),
singularities (Sites) and topological vector (arcs).
    

The core construction of topology will not be changed by the proposed changes. One advantage of the proposed changes would be that massive point datasets (that was the purpose of Sites I think) would be stored with the bare minimum of topology, that is, there would be no topology in the strict vectorial sense, but other support structures that in grass come with topology will be available, namely the category index and the spatial index, both of which are IMO useful also for point datasets.

As for 3D or 4D, the current grass vector structure is not ready for full 3D support. While changing grass vector structure, a 3D framework can be added and then activated only at a later stage or activated in several steps without breaking backwards compatibility (3D topology algorithms are still missing and it will probably take some time until we get kernels and holes attached to volumes, equivalent to centroids and isles attached to areas).

Just a try to clarify what grass is currently doing,

Markus M

I am pretty sure my description is correct with regard to grass6 vector design.

The algorithms used to build and maintain topology will not change from grass6 to grass7 (e.g. building areas from boundaries and centroids), and the general design of grass topology will not change, it will merely be stored much more efficiently. All topological features and functions will be preserved, the API will not change much (apart from some new functions that will hopefully do the same job faster), vector modules will be largely compatible. I expect less conversion work than for raster modules. IMO, the current grass6 vector design is stable and well tested and I would change the inner workings only if there is a bug.

Correct me if I'm wrong, but it seems you disapprove of the current grass6 vector design and would prefer to have the vector design of grass4 (as in KerGIS, right?) back?

Markus M

t laronde wrote:

Hello,

I will only answer for some parts.

Except if Radim has radically changed the structure (and from a cursory
look some years ago, from the topology point of view, it was not the
case; but perhaps he had drastically changed things with the arcs
storage?), your description is not topologically correct. If your
description of the actual state is correct, then this is a mess
that has failed to capture the very nature of topology.

For example for this:

On Mon, Dec 07, 2009 at 09:37:55AM +0100, Markus Metz wrote:
  

In grass, primitive features that are always present are points, lines, boundaries, centroids, faces, kernels. These primitives could also be called arcs or polylines. Hamish argued against the term arcs, I would argue against the term polylines, because points, centroids and kernels are by no means polylines, they have only one vertex. Also a line consisting of two vertices only is not a polyline in the strict sense. Not felling too strong about it, it's just that "lines" as term for all the stuff in the coor file is a bit confusing because a line can be a point, line, boundary, centroid, face, or kernel.
    
If you are right with the new vector engine, that is that the _mean_ to
add features that are not geometrical properties (that are arbitray
data), that is to _attribute_ (hence the correct name: attributes)
arbitrary information to geometrical elements, if this _mean_ has
been merged in the arcs file, it is an error.

There is no separate centroids and kernels (and what is the name for the
attribute of a geometrical line?), there are only a _topological mean_
to link external attributes to geometrical elements. It is (was at
least, and still is for KerGIS) a mean use to _store_ the information
about the link, just to be able to rebuild if needed from scratch, when
the geometrical elements do not exist (from arcs). But during run
time, an attribute can be linked to a geometrical element, or the
attribute can be changed without going by the "attribute insertion
point" (centroids, kernels and attribute point for a line are one
and only one thing; this is just the type of the geometrical element
they are used to link to that changes).

Yes this insertion point generally appears as the insertion point of
the label when drawing (on display or hardcopy). This is generally the
mean, by the GUI, to add an attribute to a geometrical feature. But
that's all.

If you think about the stuff, you see that it's not the only mean and it
has weaknesses. The first is that you need to search. The second is, if
you take a surface with overhangs, if you use only the planimetric
coordinates (x,y), the insertion points will match several objects (say
faces). One solution is to use a third coordinate, but the problem is
that the complexity increases (and the elevation is only generally
"half" a coordinate; it is not treated the same as x, y). Or you use the
index of the geometrical element to link. In this case, the attribute is
independant from the description of the geometry (since the building is
deterministic, rebuilding from the same geom file [coord] will lead to
the same indices). So in KerGIS, I use a separate file for linking
attributes, with _both_ the indices and an insertion point.

The core construction of topology will not be changed by the proposed changes. One advantage of the proposed changes would be that massive point datasets (that was the purpose of Sites I think) would be stored with the bare minimum of topology, that is, there would be no topology in the strict vectorial sense, but other support structures that in grass come with topology will be available, namely the category index and the spatial index, both of which are IMO useful also for point datasets.
    
If the support for Sites in vector means to handle them as a special
case without topology, this means that they have strictly nothing to do
there. The simple fact that you (GRASS) need to handle them in a special
way is because singularities, whether totally disconnected points
(sites, historical building etc. that have not really a link to the
other data except for "insertion points": the building is on this face)
or predefined connected points (mesh, triangulation), are special
and should not have been merged to start with.

Beauty is consistency. To put different logic in the vector stuff simply
because there is alien information is a highway to hell.

There is never a discover---for the discoverer at least---that is
unconnected to the existant. You find the next step; and from this the
next one, etc. There is a path. The topological stuff needs to be
profoundly understood first to see the strengths, the weaknesses
(strengths are more numerous than weaknesses for the GIS) and what can
be done, the questions arising and so on, to start wandering in the
vicinity. Start by studying CERL GRASS topology version. Then compare
with Radim's. And you will see what can be kept, what should be dropped
etc.

Before trying to "optimize" the current state, verify first that you are
not trying to "optimize" mistakes. When things are logical and work the
way they should, from the consistency standpoint, you can start thinking
about a way to speed. But this is a general solution, almost never hacks
(at least for a software like that; hardware drivers are another story.
Perhaps...).

Just my nickels,