[GRASS-dev] [GRASS GIS] #542: grass7 vector libraries modifications

#542: grass7 vector libraries modifications
-------------------------+--------------------------------------------------
Reporter: mmetz | Owner: grass-dev@lists.osgeo.org
     Type: enhancement | Status: new
Priority: minor | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Keywords: | Platform: All
      Cpu: All |
-------------------------+--------------------------------------------------
I want to suggest some more profound changes to the vector model for
grass7. These changes would affect topology, spatial index and maybe
category index, but not the coor file. That means that there will be
limited forward/backward compatibility: topology would need to be rebuilt
before vectors can be accessed. Vector modules would not need to be
rewritten, but more efficient library functions could be made available.

My general idea/complaint is that the current topology layout is not
tailored towards vector object types; instead several (very) different
types (points, lines, boundaries, centroids, faces, kernels) are stored in
the same structure. Working with one particular type is a bit inefficient
because the desired type has to be selected out of everything stored in
this universal structure every single time. I am sure that a lot of time
and space can be safed with a redesigned topology layout and vector
libraries that make use of it. As an example, what I want to get rid of is

{{{
for (line = 0; line < nlines; line++) {
    if (!Vect_line_alive(Map, line))
      continue;
    type = Vect_read_line(map, points, cats, line);
    if (!(type & otype))
       continue;
    /* process line */
}
}}}

The whole coor file is read, in the worst case e.g. just to get the few
centroids in it. This can not always be avoided or changed, but could
often be replaced with e.g.

{{{
for (centroid = 1; centroid < ncentroids; centroid++) {
    /* process centroid */
}
}}}

The current implementation has some consequences of which I am not sure if
they are actually desired. E.g. when cleaning a vector with tool=snap
(snapping vertices of lines and boundaries), lines and boundaries may be
snapped together at the same time: a boundary may be snapped to a line and
vice versa. Maybe this is sometimes desired, but maybe this should be
avoided? Another example is removing duplicates: currently it is possible
to do that for points and centroids together, and if there are a point and
a centroid with identical coordinates, one of them is deleted (random
selection).

With the changes I have in mind, the size of support structures should
generally go down, most for point datasets, least for areas. Massive point
datasets like LIDAR could be easier processed on level 2 with topology,
because support structures for massive point datasets would be reduced in
size by about 70% (rough estimates: spatial index reduced down to 25%,
topology reduced down to 40%).

There are however some problems with my suggestions: 1) IMHO nobody should
decide on that alone, 2) the coding is too much for one person alone, e.g.
I can't do all that without help, 3) I'm not really a programmer, 4) I
don't know enough about vector geometry algorithms.

Below are more technical details:

== Status quo ==

the coor file holds lines (better: primitives) of types[[BR]]
point[[BR]]
line[[BR]]
boundary[[BR]]
centroid[[BR]]
face (3D boundary, not yet implemented)[[BR]]
kernel (3D centroid, not yet implemented)[[BR]]

structures derived from these types are[[BR]]
nodes[[BR]]
areas[[BR]]
isles[[BR]]
edges (3D areas, not yet implemented)[[BR]]
volumes (3D shapes, not yet implemented)[[BR]]
holes (3D volumes within volumes, like isles in areas, not yet
implemented)

topology holds information about[[BR]]
nodes[[BR]]
lines[[BR]]
areas[[BR]]
isles

where lines can be points, lines, boundaries, centroids, faces, or kernels

see
[http://trac.osgeo.org/grass/browser/grass/trunk/include/vect/dig_structs.h#L440]

points, lines, boundaries, centroids, faces, kernels are obviously
different things, but the current topology layout squeezes all of them
into the same structure with information about:
start node (assigned for all types, but not needed for points, centroids,
kernels)[[BR]]
end node (used for lines and boundaries, otherwise unused)[[BR]]
area to left (for boundary, area for centroid, unused for all other
types)[[BR]]
area to right (for boundary, unused for all other types)[[BR]]
3D bounding box (completely redundant for points, centroids,
kernels)[[BR]]
offset (into coor file)[[BR]]
type (point, line, boundary, centroid, face, or kernel)

== Proposed new layout ==

the coor file would hold the same types as before. To avoid confusion, all
coordinate strings would be referred to as primitives (like in the output
of current v.build), but that's just naming. IMHO anything but line is
fine. A line can be a line or boundary or point or ... is too
philosophical for my taste.

topology would have a separate data structure for each of[[BR]]
points[[BR]]
lines[[BR]]
boundaries[[BR]]
nodes (only needed for lines, boundaries, and faces)[[BR]]
centroids[[BR]]
areas[[BR]]
isles[[BR]]
faces[[BR]]
edges[[BR]]
volumes[[BR]]
holes

An additional small data structure would be needed that would be a boiled
down replacement of current P_Line with information about primitives.

Similarly, a separate spatial index would be created for each type
separately, instead of lumping all points, lines, boundaries, centroids,
faces, and kernels into the same spatial index. It is more efficient with
regard to time and space if separate spatial indices are maintained.

I'm reaching limits on what I can change in the vector libs without
breaking compatibility, and I'm sometimes getting frustrated with the
waste of time and space for large vectors. IIUR grass7 is an opportunity
to introduce changes like these, so I hope to initiate a discussion and
for more ideas on how to improve grass vector handling.

Regards,

Markus M

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/542&gt;
GRASS GIS <http://grass.osgeo.org>

#542: grass7 vector libraries modifications
--------------------------+-------------------------------------------------
  Reporter: mmetz | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: minor | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords:
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by mlennert):

As no one has ever reacted to this, I will now: I'm not a real programmer,
so won't be able to say much about the details, but from my limited
experience, what you propose sounds very reasonable and I encourage you to
go on in that direction.

Moritz

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/542#comment:1&gt;
GRASS GIS <http://grass.osgeo.org>

#542: grass7 vector libraries modifications
--------------------------+-------------------------------------------------
  Reporter: mmetz | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords:
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Changes (by martinl):

* cc: martinl (added)
  * priority: minor => major

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/542#comment:2&gt;
GRASS GIS <http://grass.osgeo.org>

#542: grass7 vector libraries modifications
--------------------------+-------------------------------------------------
  Reporter: mmetz | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords:
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by martinl):

Replying to [ticket:542 mmetz]:
> face (3D boundary, not yet implemented)[[BR]]
> edges (3D areas, not yet implemented)[[BR]]

Just very little note: shouldn't be

  * face 3D area

and

  * edge 3D boundary

?

Thanks, Martin

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/542#comment:3&gt;
GRASS GIS <http://grass.osgeo.org>

#542: grass7 vector libraries modifications
--------------------------+-------------------------------------------------
  Reporter: mmetz | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords:
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by mmetz):

Replying to [comment:3 martinl]:
> Replying to [ticket:542 mmetz]:
> > face (3D boundary, not yet implemented)[[BR]]
> > edges (3D areas, not yet implemented)[[BR]]
>
> Just very little note: shouldn't be
>
> * face 3D area
>
> and
>
> * edge 3D boundary
>
> ?
>
You're right, yes. The documentation is a bit contradicting, e.g. in
GRASS7 programmer's manual it says "Face and kernel are 3D equivalents of
boundary and centroid...", but edge is commonly used as 2D/3D boundary
(Vector network analysis, general graph theory). Thus for 3D we would have
edges in the coor file, faces (3D areas) are going to be constructed from
edges and volumes are going to be constructed from faces and kernels. Is
this nomenclature ok?

Markus M

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/542#comment:4&gt;
GRASS GIS <http://grass.osgeo.org>

#542: grass7 vector libraries modifications
--------------------------+-------------------------------------------------
  Reporter: mmetz | Owner: grass-dev@lists.osgeo.org
      Type: enhancement | Status: new
  Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords:
  Platform: All | Cpu: All
--------------------------+-------------------------------------------------
Comment (by martinl):

> You're right, yes. The documentation is a bit contradicting, e.g. in
GRASS7 programmer's manual it says "Face and kernel are 3D equivalents of
boundary and centroid...", but edge is commonly used as 2D/3D boundary
(Vector network analysis, general graph theory). Thus for 3D we would have
edges in the coor file, faces (3D areas) are going to be constructed from
edges and volumes are going to be constructed from faces and kernels. Is
this nomenclature ok?

Seems to be reasonable to me.

Martin

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/542#comment:5&gt;
GRASS GIS <http://grass.osgeo.org>

#542: grass7 vector libraries modifications
--------------------------+-------------------------
  Reporter: mmetz | Owner: grass-dev@…
      Type: enhancement | Status: new
  Priority: major | Milestone: 7.0.0
Component: Vector | Version: svn-trunk
Resolution: | Keywords:
       CPU: All | Platform: All
--------------------------+-------------------------

Comment (by martinl):

I am not sure if it is still relevant to GRASS 7. Probably moving this
ticket to milestone GRASS 8 would make better sense?

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/542#comment:6&gt;
GRASS GIS <http://grass.osgeo.org>