[GRASS5] Re: [GRASSLIST:3809] Re: Vector Manipulation

(I have switched to developers list)

On Friday 31 May 2002 04:45 pm, Christoph Simon wrote:

I didn't really analize such a program in depth, so maybe this is a
bit ingenuous, but I think it actually must try to mimic r.mapcalc as
close as possible, as this would reduce the learning curve of those
who already know that.

What you writes below ("no reason to implement useless functions just because
the do exist in r.mapcalc") seems to me more reasonable.

Without understanding how vectors are described
in Grass, I can only start from the user's perspective (this might be
a good thing). If r.mapcalc allows:

  mapC = mapA OP mapB

Yes. Do you mean that mapX represents geometry here?

or
  mapC = functionX (mapA, mapB)

And here, mapX represents attributes (categories)? Vector attributes
are saved (or should be, or will be) saved in external RDBMS. I would
left this calculations for that system.

v.mapcalc should do the same. In case of the first form, I would
expect v.mapcalc to operate on entities, i.e., points, lines,
areas. In that case, the effect of the operation should be expectable:
line(a) plus line(b) should output both in mapC. Line(a) AND line(b)
only, if that line is present in both. In case of an area, I would
expect to get the intersecting parts accordingly, etc. It would be a
matter to create a table of all functions from r.mapcalc and see the
most obvious behaviour. I'm sure there will be cases, where the
`obvious' behaviour is not obvious at all. But on one side I think
this is not going to be always the case (with plus and AND it is not),
and on the other, this would be just a matter of discussion among the
users. Isn't this on of the advantages of open software?

Which OP (operators)? Probably similar used for methods in
http://www.opengis.org/techno/specs/99-050.pdf
2.2.1.3, 2.2.1.4. Or anybody knows some better 'standard'
definition of spatial relations?

In any case, basing v.mapcalc on r.mapcalc is meant to be a starting
point, not a restriction: there is no reason to implement useless
functions just because the do exist in r.mapcalc, and there is no
reason to not implement others which r.mapcalc doesn't have, but which
are useful. One difference of course would be right away, that vector
operations on coordinates will need a tolerance (snapping
distance). At least initially, I would implement this globally for the
whole operation. Of course regions and mask would have to be respected
mucht the like r.mapcalc does.

The second form would allow for functions that operate by function
definition on the coordinates of the vertices or on the attribute
values, maybe limited to one map. Then a construction like

  macC = reproject (mapA, NEW_PROJ) AND mapB

I didn't look at the sources of r.mapcalc, but I guess the grammar
parser should already exist and be useful for this. The rest is
filling in the function :wink:

Reprojecting is done by v.proj or v.transform now. I would not mix all to one
module.

Do you think, you could contribute somehow. I think that if you could
write reliable module for just few most common spatial operations, it would
be appreciated by most of GRASS users. Note that for example JTS project
exists: http://www.vividsolutions.com/jts/jtshome.htm
and some functions are already present in grass (v.cutter).

Radim

On Tue, 4 Jun 2002 17:57:02 +0200
Radim Blazek <blazek@itc.it> wrote:

(I have switched to developers list)

I'm not subscribed to that, but it's OK for me.

What you writes below ("no reason to implement useless functions just
because the do exist in r.mapcalc") seems to me more reasonable.

Agreed. I was meaning r.mapcalc as a guide in syntax and formalism; in
naming only, if there is really an analogue functionality. But even in
this context, there are important differences: For example, a raster
has just a dot at each point, but a vector map might have an area,
line or point or a combination of several. Depending on the particular
function, a syntax extension could be introduced, like mapA:L for only
line objects in mapA. If the boolean AND operator means the
intersection, a statement like "mapA:A AND mapB:P" would select all
sites from map B which are in areas of mapA. Of course, I also can
specify a region convert mapA to a raster and create a MASK, unless
the region is very big and the vectors have a high resolution, such
that the raster would fill my disk.

> mapC = mapA OP mapB

Yes. Do you mean that mapX represents geometry here?

I meant geometry here. If I AND two areas (e.g, intersection) and the
area of mapA had category 5 and the area of mapB had category 10,
which should be the attribute of the resulting area? I think, there is
no `plug and play' solution for it. One solution might be to renumber
the categories and label them by concatenating the original labels,
separated by a caller defined character (e.g., '+'). Later, these new
areas could be reclassified based on the new labels.

> or
> mapC = functionX (mapA, mapB)

And here, mapX represents attributes (categories)? Vector attributes
are saved (or should be, or will be) saved in external RDBMS. I would
left this calculations for that system.

In this case, I meant to leave this as part of the function
definition, which consists of the arguments and type of arguments to
be used. The idea is not actually to perform attribute calculations,
but to compute vectors in dependency of attributes. This way, I could
select (output) a line if it is in an area which has an attribute e.g.
greater than a certain threshold. As the output is the unchanged
object (line), the attribute of that should also remain unchanged. I
do realize that this is still a simplification, as it would have to be
defined what happens to lines which cross the area's boundary. I could
leave it out, I could just include it, or I could truncate it to the
piece within the area. This can be solved either by using flags, or by
deciding arbitrarily what seems to be the most frequent case. (Here I
would probably choose the last option).

I don't know how much automatic optimization is possible. When
combining more than one of these operations, temporary maps will have
to be build. Maybe this is another difference to r.mapcalc.

Which OP (operators)? Probably similar used for methods in
http://www.opengis.org/techno/specs/99-050.pdf
2.2.1.3, 2.2.1.4. Or anybody knows some better 'standard'
definition of spatial relations?

I am not aware of any standard for spatial relation, but if there is
one, my vote is to follow it as long as practical. The functions
mentioned in both sections seem very useful. These are not very long
lists, so they might be a good starting point. But rather than trying
to define a complete set of any useful function, I would try to create
an interface which facilitates the addition of new functions. If this
infrastructure would exist and there was a good description of how to
access and manipulate the data, I probably already would have
contributed some (maybe useless) functions.

The attempt of finding the complete set of useful functions shows that
this appears to be an endless game. Maybe it would be possible even to
provide a mean to dynamically load user functions, much the like SQL
does. If it is not necessary to recompile a grass module, allowing a
site not even having the grass sources online, interactive development
could provide solutions for very specific cases, where a standard
module just wouldn't make sense. The most frequently used functions
could form a growing basic set.

Reprojecting is done by v.proj or v.transform now. I would not mix all
to one module.

You are totally right. I am at the very beginning in learning grass,
and there are many modules I'm not yet aware of.

Do you think, you could contribute somehow. I think that if you could
write reliable module for just few most common spatial operations, it
would be appreciated by most of GRASS users. Note that for example JTS
project exists: http://www.vividsolutions.com/jts/jtshome.htm
and some functions are already present in grass (v.cutter).

The aims of JTS seem promising, but actually I hate java. This is
probably the reason why I didn't enter that site when looking for
additional modules. My programming is pure C only, mainly in
networking and a bit of RDBMS. In any case, I'll have a closer look to
what they are doing.

To actually contribute, I would need to know many more details of
grass' internals. As I do this in my spare time only (mainly
weekends), this will take some time. Where I might contribute beside
pure coding, is the parser, if that of r.mapcalc can't be used, and
maybe some stuff with dynamically loading, at least on linux, though I
think there are some portable libraries that take care of this.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Tuesday 04 June 2002 08:00 pm, Christoph Simon wrote:

Depending on the particular
function, a syntax extension could be introduced, like mapA:L for only
line objects in mapA. If the boolean AND operator means the
intersection, a statement like "mapA:A AND mapB:P" would select all
sites from map B which are in areas of mapA.

Sounds nice, but now, let us try to define how to do these things
on internal/library level (your "infrastructure" below), expression
parser will probably follow.

> > mapC = mapA OP mapB
>
> Yes. Do you mean that mapX represents geometry here?

I meant geometry here. If I AND two areas (e.g, intersection) and the
area of mapA had category 5 and the area of mapB had category 10,
which should be the attribute of the resulting area? I think, there is
no `plug and play' solution for it. One solution might be to renumber
the categories and label them by concatenating the original labels,
separated by a caller defined character (e.g., '+'). Later, these new
areas could be reclassified based on the new labels.

I think, that all these new features should be done in grass51 vectors (which
was started as first step for new vector functionality like this).
There are two possibilities to combine attributes in g51:
1) attache new categories, create new table which contains (selected) columns
from both input tables
2) attach 2 categories to resulting graphic elements, each linked to one of
original tables

> > or
> > mapC = functionX (mapA, mapB)
>
> And here, mapX represents attributes (categories)? Vector attributes
> are saved (or should be, or will be) saved in external RDBMS. I would
> left this calculations for that system.

In this case, I meant to leave this as part of the function
definition, which consists of the arguments and type of arguments to
be used. The idea is not actually to perform attribute calculations,
but to compute vectors in dependency of attributes. This way, I could
select (output) a line if it is in an area which has an attribute e.g.
greater than a certain threshold. As the output is the unchanged
object (line), the attribute of that should also remain unchanged. I
do realize that this is still a simplification, as it would have to be
defined what happens to lines which cross the area's boundary. I could
leave it out, I could just include it, or I could truncate it to the
piece within the area. This can be solved either by using flags, or by
deciding arbitrarily what seems to be the most frequent case. (Here I
would probably choose the last option).

There are usualy different operators for such cases:
- within
- completely within
- intersect (cut the the part within - I'm not sure about the name)

I am not aware of any standard for spatial relation, but if there is
one, my vote is to follow it as long as practical. The functions
mentioned in both sections seem very useful. These are not very long
lists, so they might be a good starting point. But rather than trying
to define a complete set of any useful function, I would try to create
an interface which facilitates the addition of new functions. If this
infrastructure would exist and there was a good description of how to
access and manipulate the data, I probably already would have
contributed some (maybe useless) functions.

I fully agree - create interface. Such contributions may not be useless,
because almost nothing is in grass available now.

The attempt of finding the complete set of useful functions shows that
this appears to be an endless game. Maybe it would be possible even to
provide a mean to dynamically load user functions, much the like SQL
does. If it is not necessary to recompile a grass module, allowing a
site not even having the grass sources online, interactive development
could provide solutions for very specific cases, where a standard
module just wouldn't make sense. The most frequently used functions
could form a growing basic set.

I would simply start with some most common, loading user functions may be
probably postponed.

The aims of JTS seem promising, but actually I hate java. This is
probably the reason why I didn't enter that site when looking for
additional modules. My programming is pure C only, mainly in
networking and a bit of RDBMS. In any case, I'll have a closer look to
what they are doing.

GRASS code must be in C.

To actually contribute, I would need to know many more details of
grass' internals. As I do this in my spare time only (mainly
weekends), this will take some time. Where I might contribute beside
pure coding, is the parser, if that of r.mapcalc can't be used, and
maybe some stuff with dynamically loading, at least on linux, though I
think there are some portable libraries that take care of this.

I think that I can prepare that infrastructure and you can receive
pure geometry in
struct line_pnts { double *x,*y,*z; int n_points; int alloc_points; };
or may be with some cals like: Vect_get_line_areas() - to get areas on
both sides of boundary.

I'll try to start with infrastructure definition
(for now I'll consider just about two overlayed vector maps):

I) Variants of results we want to get from overlay:
  a) list of IDs(internal numbers) of elements in one map, which fulfil
     operator condition (if no breaking required)
  b) list of elements (i.e. list of new line_pnts structures created by
     intersecting)
  c) list of IDs (for both maps) + sizes (length or area) - if just report
     is required, for example which lines intersect which areas and how long
     are these intersections

II) How to do that? I see 2 ways how to analyse vectors:
  a) 1) Overlay whole maps and create new map (intersect all lines
        and areas) and save as standard grass vector map where for
        new elements will be saved its origin (area 456 in MapA and
        line 123 in MapB
     2) Go through all new elements and select that which fits the operator
  b) Go through all elements of map and check if fulfil required rule

Function (probably only suitable for II-b):
Vect_analyse/overlay (
  struct Map_info *MapA,
  struct ilist *ListA, // list of elements in MapA to operate on
  int typeA, // type of elements in MapA to operate on
  struct Map_info *MapB,
  struct ilist *ListB,
  int typeB,
  int operator, // AND, WITHIN, COMPLETELY_WITHIN, ....
  int result_type, // Ia, Ib, Ic above
  void *List) // list of results

Radim

On Thu, 6 Jun 2002 15:15:19 +0200
Radim Blazek <blazek@itc.it> wrote:

I think, that all these new features should be done in grass51 vectors
(which was started as first step for new vector functionality like
this). There are two possibilities to combine attributes in g51:
1) attache new categories, create new table which contains (selected)
columns from both input tables
2) attach 2 categories to resulting graphic elements, each linked to one
of original tables

When dealing with software which I don't know well, I generally prefer
the stable branch to avoid bad surprises while learning. But each
project has a different level of unstability. Would you think it's
high risk for me to go to grass51?

I see that many of the things which are not obvious for me have
already been solved. I hope you'll tell me that grass 51 is not high
risk :wink:

Does the programming manual PDF couver the new vector features? If
not, Where can I get those?

There are usualy different operators for such cases:
- within
- completely within
- intersect (cut the the part within - I'm not sure about the name)

That'a why I said, that I'm not qualified to do this alone. I'm just
too much at the beginning.

I would simply start with some most common, loading user functions may
be probably postponed.

Right, but if this is a functionality planned for a later step, the
design should allow for it from the beginning. Even if the parser
doesn't do so, it shouldn't require a total rewrite to make a (hash)
table lookup for a function which might need dynamic loading. This
could be done by installing all `factory' functions already in such a
hash table and make the lookup for everything which by syntax needs to
be an operator. This is not too hard for flex/bison. The locating and
loading of these function certainly can wait.

GRASS code must be in C.

Thanks God!

I think that I can prepare that infrastructure and you can receive
pure geometry in
struct line_pnts { double *x,*y,*z; int n_points; int alloc_points; };
or may be with some cals like: Vect_get_line_areas() - to get areas on
both sides of boundary.

OK. Then we would have a parser, interpreting the commandline, much
the like r.mapcalc works (as I guess), and that will dispatch to a
function for each operation, which will output an intermediate or
final result. You will open the files according the parser's results,
look up the functions and operators, and call that function with the
needed data. Right?

I'll try to start with infrastructure definition
(for now I'll consider just about two overlayed vector maps):

This is certainly a good restriction for now. But if we create
temporary maps for each atomic functions, even if it's not always very
efficient, more maps wouldn't mean more programming work.

I) Variants of results we want to get from overlay:
  a) list of IDs(internal numbers) of elements in one map, which fulfil
     operator condition (if no breaking required)
  b) list of elements (i.e. list of new line_pnts structures created by
     intersecting)
  c) list of IDs (for both maps) + sizes (length or area) - if just
  report
     is required, for example which lines intersect which areas and how
     long are these intersections

Sounds good. I guess the basic vector functions already exist to
compute permiter and area, right? They just need to be called here.

II) How to do that? I see 2 ways how to analyse vectors:
  a) 1) Overlay whole maps and create new map (intersect all lines
        and areas) and save as standard grass vector map where for
        new elements will be saved its origin (area 456 in MapA and
        line 123 in MapB
     2) Go through all new elements and select that which fits the
     operator
  b) Go through all elements of map and check if fulfil required rule

I would probably try to choose version b) as it sounds more
efficient. In version a) I imagine that we get first a map which will
essentially duplicate maps A and B, eliminating then those elements
which need to be discarded. In case of version b) this shouldn't be
the case, but I imagine, that in the end, this might be a tradeoff
between memory requirements and processing power, as in case of
version b) eventually more comparisons need to be done in certain
cases.

Function (probably only suitable for II-b):
Vect_analyse/overlay (
  struct Map_info *MapA,
  struct ilist *ListA, // list of elements in MapA to operate on
  int typeA, // type of elements in MapA to operate on
  struct Map_info *MapB,
  struct ilist *ListB,
  int typeB,
  int operator, // AND, WITHIN, COMPLETELY_WITHIN, ....
  int result_type, // Ia, Ib, Ic above
  void *List) // list of results

I'll have to dive into the programmers manual to get familiarized with
the specific types. So guessing only: MapAB is probably the header
info and ListAB the actual map data. typeAB would be either area,
line, or point. Operator is specified by an int (or enum). But why is
the resulting List a void pointer? Shouldn't it be also a pair of
Map_info and ilist? Also, the result_type can always be only one or is
this a bitfield? I guess this function returns an integer to indicate
success or failure (for instance to allocate the memory for the
resulting map or for lacking disk space, having no write permissions,
etc., to write the result).

I think your are going to take a while until having this basic
skeleton working. In the meanwhile, if you tell me that I need it,
I'll replace my grass5 by grass51 and start studying the programmers
manual.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Thursday 06 June 2002 05:28 pm, Christoph Simon wrote:

When dealing with software which I don't know well, I generally prefer
the stable branch to avoid bad surprises while learning. But each
project has a different level of unstability. Would you think it's
high risk for me to go to grass51?

Does the programming manual PDF couver the new vector features? If
not, Where can I get those?

I don't know what is high risk for you. Grass51 is at the very beginning,
changing from day to day (BTW for g51 testers: yesterday, I have changed
'topo' format -> necessary to rebuild topology for old vectors - v.build)
and new API is not yet documented. On the other hand, it already contains
richer enviroment than grass50. Basic functions are either identical
or similar and where you need more, you have to access undocumented
internal structures in grass50, you can use undocumented functions in
grass51, for example:
grass50: n1 = map->Line[line].N1; n2 = map->Line[line].N2;
grass51: Vect_get_line_nodes ( map, line, &n1, &n2);
You can also compare:
http://freegis.org/cgi-bin/viewcvs.cgi/grass/src/include/Vect.h?rev=1.4&content-type=text/vnd.viewcvs-markup
http://freegis.org/cgi-bin/viewcvs.cgi/grass51/include/Vect.h?rev=1.16&content-type=text/vnd.viewcvs-markup
What is higher risk?

New features are described in:
http://grass.itc.it/grass51/index.html
http://freegis.org/cgi-bin/viewcvs.cgi/~checkout~/grass51/doc/vector/vector.html
and testing data set is here:
http://mpa.itc.it/radim/g51/ (unpack, start grass51 and run ./tour)

... , if you tell me that I need it,
I'll replace my grass5 by grass51 and start studying the programmers
manual.

I would say that you need it, for this task will be probably less painful
to use grass51.

> I would simply start with some most common, loading user functions may
> be probably postponed.

Right, but if this is a functionality planned for a later step, the
design should allow for it from the beginning. Even if the parser
doesn't do so, it shouldn't require a total rewrite to make a (hash)
table lookup for a function which might need dynamic loading. This
could be done by installing all `factory' functions already in such a
hash table and make the lookup for everything which by syntax needs to
be an operator. This is not too hard for flex/bison. The locating and
loading of these function certainly can wait.

List of operators should not be endless, there are just few basic and
others are created by combination of these:
AND(intersect), OR(+), NOT(-), XOR, EQUAL,
WITHIN, COMPLETELY_WITHIN, CONTAINS, COMPLETELY_CONTAINS,
TERMINATES_IN, CONTAINS_END_OF, TOUCH, ... (few others)

But if you are capable to create enviroment mentioned above, why not.

OK. Then we would have a parser, interpreting the commandline, much
the like r.mapcalc works (as I guess), and that will dispatch to a
function for each operation, which will output an intermediate or
final result. You will open the files according the parser's results,
look up the functions and operators, and call that function with the
needed data. Right?

Yes.

> I'll try to start with infrastructure definition
> (for now I'll consider just about two overlayed vector maps):

This is certainly a good restriction for now. But if we create
temporary maps for each atomic functions, even if it's not always very
efficient, more maps wouldn't mean more programming work.

Yes (note: support for temporary vector maps not yet written).

> I) Variants of results we want to get from overlay:
> a) list of IDs(internal numbers) of elements in one map, which fulfil
> operator condition (if no breaking required)
> b) list of elements (i.e. list of new line_pnts structures created by
> intersecting)
> c) list of IDs (for both maps) + sizes (length or area) - if just
> report
> is required, for example which lines intersect which areas and how
> long are these intersections

Sounds good. I guess the basic vector functions already exist to
compute permiter and area, right? They just need to be called here.

There are for example: Vect_get_area_area (), Vect_line_length (),
Vect_line_geodesic_length ();

> II) How to do that? I see 2 ways how to analyse vectors:
> a) 1) Overlay whole maps and create new map (intersect all lines
> and areas) and save as standard grass vector map where for
> new elements will be saved its origin (area 456 in MapA and
> line 123 in MapB
> 2) Go through all new elements and select that which fits the
> operator
> b) Go through all elements of map and check if fulfil required rule

I would probably try to choose version b) as it sounds more
efficient. In version a) I imagine that we get first a map which will
essentially duplicate maps A and B, eliminating then those elements
which need to be discarded. In case of version b) this shouldn't be
the case, but I imagine, that in the end, this might be a tradeoff
between memory requirements and processing power, as in case of
version b) eventually more comparisons need to be done in certain
cases.

Yes. Advantage of a) is that you can overlay once (which may take also
hours or days) and then you can do many (quick) queries. Disadvantage
is that often just overlay of few elements is needed and user must take
care of new map. In addition, a) is not too suitable for interactive
queries (on monitor only). So, let us 'try' b).

> Function (probably only suitable for II-b):
> Vect_analyse/overlay (
> struct Map_info *MapA,
> struct ilist *ListA, // list of elements in MapA to operate on
> int typeA, // type of elements in MapA to operate on
> struct Map_info *MapB,
> struct ilist *ListB,
> int typeB,
> int operator, // AND, WITHIN, COMPLETELY_WITHIN, ....
> int result_type, // Ia, Ib, Ic above
> void *List) // list of results

I'll have to dive into the programmers manual to get familiarized with
the specific types. So guessing only: MapAB is probably the header
info and ListAB the actual map data.

ListAB is just list of feature IDs used by
Vect_read_line (map,points,cats,ID);
I'am not yet sure how this list should look like, I needed list, so I wrote
that, but should be probably done in more 'systematic' way G_list_*
(that is why I asked about GLib).

typeAB would be either area,
line, or point. Operator is specified by an int (or enum).

Yes.

But why is
the resulting List a void pointer?

void because I did not know which type it should be. Because we want
3 types of results (Ia, Ib, Ic), list of results well be in 3 types:
Ia) iList *
Ib) featureList * {
      struct line_pnts *Points,
      struct line_cats *Cats,
      int n_features, alloc_features
}
Ic) reportList? * {
      int *idA, *idB,
      double *size,
      int n_items
}
But featureList, reportList are hypothetical and do not exist at present.

Shouldn't it be also a pair of Map_info and ilist?

No, because, sometimes the result will not be written to new map.
For example interactive module, which enables to select features
by polygon drawn on monitor, wants first to display selection before it writes
new map.

This also reminded me, that also input in form:
   struct Map_info *MapA,
   struct ilist *ListA
is not always best, because it may be only one polygons created
on runtime. We cannot use featureList, because it would not be effective
for maps => ?

Also, the result_type can always be only one or is
this a bitfield?

One at a time, no bits.

I guess this function returns an integer to indicate
success or failure (for instance to allocate the memory for the
resulting map or for lacking disk space, having no write permissions,
etc., to write the result).

Don't know yet what should return, probably number of items in results
and <0 for error.

Radim