[GRASS5] Re: [GRASSLIST:3976] Re: intersect sites with polygons?

On Friday 28 June 2002 11:45 am, Markus Neteler wrote:

On Thu, Jun 27, 2002 at 10:57:56AM -0700, Jeff D. Hamann wrote:
> I would like to intersect a sites layer with a polygon layer to obtain
> only the polygons that have a site within them... Do I have to convert
> the sites to a raster/vector layer first?

An (un)useful hint related to this topic:

GRASS lib provides functions for that:

int
Vect_get_point_in_poly (struct line_pnts *Points, double *X, double *Y)

  get point inside polygon

  This routine finds a suitable area point in the ring described by the
line struct Points, but without reference to any islands that may be
present inside the area. This routine is useful where prior topological
  information on an area is not available.

  The return value is 0 on success, -1 on failure.}

(also points in islands etc)

So it *should* be possible to write 'v.points.in.polygon' reading a
vector points map (eventually generated with s.to.vect) and a
polygon vector file.
Maybe we find a volunteer to code that?

As this is one of MANY spatial analysis, it should go to
v.mapcalc (already started) and function available is:
int Vect_point_in_area (struct Map_info *, int, double, double);
(takes into account islands)
then
v.mapcalc "selpol = pol CONTAINS sites"

Radim

On Mon, 1 Jul 2002 13:23:15 +0200
Radim Blazek <blazek@itc.it> wrote:

> So it *should* be possible to write 'v.points.in.polygon' reading a
> vector points map (eventually generated with s.to.vect) and a
> polygon vector file.
> Maybe we find a volunteer to code that?

As this is one of MANY spatial analysis, it should go to
v.mapcalc (already started) and function available is:
int Vect_point_in_area (struct Map_info *, int, double, double);
(takes into account islands)
then
v.mapcalc "selpol = pol CONTAINS sites"

Radim, how could we use these functions in v.mapcalc? I'm thinking to
integrate them like I did with the math functions. But also
dynamically loading (this is already working), eventually using a
wrapper, could be a good approach, as it seems that those functions
are in active development. I think we could get a bunch of quite
useful things to work quickly in v.mapcalc.

To make work the above syntax

v.mapcalc "selpol = pol CONTAINS sites"

v.mapcalc's lexical scanner needs to reconize the word CONTAINS and
the parser will have to know which function to call. If there is a
substantial set of applications this can and should be done. More
operator words like this? A syntax which allows immediate
implementation of such a function would be:

  v.mapcalc "selpol = contains (pol, sites);"

In this form it will be enough to write an external module which
provides a function called "contains", accepting two MAPs as argument,
and creating and returning a new MAP which will get the name selpol.

The way v.mapcalc pretends to deal with this, is that until a function
point_in_poly() or contains() becomes active, no test is made wether
the given maps actually have something which can contain something
else, or which can be contained. This is to minimize the number of
different types v.mapcalc's scanner and parser have to deal with. But
thinking of this problem, I could imagine that it might be useful to
extend the syntax a bit:

  mymap.p[3] select third polygone
  mymap.l(45) select all lines with attribute 45
  mymap.a use only areas (closed polygons)

Within v.mapcalc, e.g., before performing the action, only a struct of
type MAP is moved around. This struct is defined in v.mapcalc only and
different from any struct of such name in Grass, if that exists. The
lexical scanner could extract these suffices and add it as a new
member of MAP. Of course, the above syntax could get confused by a map
which is actually called "mymap.a", but I think, maps do not have any
extension normally. OTOH,

  mymap.pl

could be acceptible to choose any polygon or line, eventually followed
by an object or attribute specification, which might also allow for
other extensions like

  [2,4-6,9]

to select the objects 2, 4, 5, 6, and 9. The most complex syntax I can
think here, would be several letters for each acceptible type of
object, an expression to select a set of objects and another to select
a set of attributes, all combined by logical OR. And if an exclamation
mark preceds a letter or digit or range, that could be excluded.

For instance, this could allow to visualize the object another
program/function reported to have an error, and with an apropriate
replace function, that only object could be changed (non)interactively
and nongraphically.

Would there be enough application to do this? Which are the letters
beside p, l, a we would need? Are brackets and parenths OK or should
there be anything else? I chose them, as should remind of an array,
i.e., the nth object in this map, and () is like a function, as the
attribute is given in function of what is associated to an object.

Also, if we allow to select a particular object (first suffix) or
attribute (second suffix), how useful would it be to be able to
perform a loop in v.mapcalc? Loops and conditionals are not being
considered for now, but could be done with some effort. Then all or a
certain number or type of objects can be scanned in the map and,
according to certain conditions, processed in a dependent way. I would
probably try to mimic a "for" and "if" construct of the C
language. Interesting enough to spend the time?

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Monday 01 July 2002 03:53 pm, Christoph Simon wrote:

> v.mapcalc "selpol = pol CONTAINS sites"

Radim, how could we use these functions in v.mapcalc? I'm thinking to
integrate them like I did with the math functions. But also
dynamically loading (this is already working), eventually using a
wrapper, could be a good approach, as it seems that those functions
are in active development. I think we could get a bunch of quite
useful things to work quickly in v.mapcalc.

The basic framework is Vect_overlay(); lib/vector/Vlib/overlay.c
where should go all standard ovelay operators because may be used
also in other modules.

To make work the above syntax

> v.mapcalc "selpol = pol CONTAINS sites"

v.mapcalc's lexical scanner needs to reconize the word CONTAINS and
the parser will have to know which function to call. If there is a
substantial set of applications this can and should be done. More
operator words like this? A syntax which allows immediate
implementation of such a function would be:

  v.mapcalc "selpol = contains (pol, sites);"

In this form it will be enough to write an external module which
provides a function called "contains", accepting two MAPs as argument,
and creating and returning a new MAP which will get the name selpol.

I started in dig_defines.h :
/* Overlay operators */
#define GV_ON_AND "AND" /* intersect */
#define GV_ON_OVERLAP "OVERLAP"

typedef enum {
    GV_O_AND,
    GV_O_OVERLAP,
} OVERLAY_OPERATOR;

Here we can add CONTAINS, as it is standard operator. There is
also Vect_overlay_str_to_operator ( char *str ); to convert
operator name to code; Then Vect_overlay() may be called with this code.

The way v.mapcalc pretends to deal with this, is that until a function
point_in_poly() or contains() becomes active, no test is made wether
the given maps actually have something which can contain something
else, or which can be contained. This is to minimize the number of
different types v.mapcalc's scanner and parser have to deal with. But
thinking of this problem, I could imagine that it might be useful to
extend the syntax a bit:

  mymap.p[3] select third polygone
  mymap.l(45) select all lines with attribute 45
  mymap.a use only areas (closed polygons)

Yes, that would be useful, at least types (BTW polygons are not used,
just areas and "." (dot) may be used in map names).
Order numbers are not used, just categories like mymap.l(45)
Also SQL statement would be useful: mymap.l(flow > 50). We already
discussed this and I don't know conclusion, this scheme
c = a.a(12) AND b.p
may be replaced by:
c = select(a, AREAS, 12) AND select(b,POINTS)
which is less effective but easier to develop and maintain.

Within v.mapcalc, e.g., before performing the action, only a struct of
type MAP is moved around. This struct is defined in v.mapcalc only and
different from any struct of such name in Grass, if that exists. The
lexical scanner could extract these suffices and add it as a new
member of MAP. Of course, the above syntax could get confused by a map
which is actually called "mymap.a", but I think, maps do not have any
extension normally. OTOH,

  mymap.pl

could be acceptible to choose any polygon or line, eventually followed
by an object or attribute specification, which might also allow for
other extensions like

  [2,4-6,9]

to select the objects 2, 4, 5, 6, and 9.

Use Vect_str_to_cat_list();

The most complex syntax I can
think here, would be several letters for each acceptible type of
object, an expression to select a set of objects and another to select
a set of attributes, all combined by logical OR.

I would expect AND.

And if an exclamation
mark preceds a letter or digit or range, that could be excluded.

For instance, this could allow to visualize the object another
program/function reported to have an error, and with an apropriate
replace function, that only object could be changed (non)interactively
and nongraphically.

Would there be enough application to do this? Which are the letters
beside p, l, a we would need?

Should be:
p - point
l - line
b - boundary
c - centroid
a - area

Are brackets and parenths OK or should
there be anything else? I chose them, as should remind of an array,
i.e., the nth object in this map, and () is like a function, as the
attribute is given in function of what is associated to an object.

Probably, but "." may not be used. It could be possible to put
type also to parentheses: map(pl;1-3)

Also, if we allow to select a particular object (first suffix) or
attribute (second suffix), how useful would it be to be able to
perform a loop in v.mapcalc? Loops and conditionals are not being
considered for now, but could be done with some effort. Then all or a
certain number or type of objects can be scanned in the map and,
according to certain conditions, processed in a dependent way. I would
probably try to mimic a "for" and "if" construct of the C
language. Interesting enough to spend the time?

Cannot say, but to implement more overlay operators seems to have
higher priority.

Radim

Radim Blazek wrote:

As this is one of MANY spatial analysis, it should go to
v.mapcalc (already started)

I've just had a look at that; it doesn't seem to provide much of a
basis for an actual tool along the lines of r.mapcalc.

The first thing to decide is the structure of the processing pipeline.
E.g. for r.mapcalc, the expression is evaluated for each row of the
result. An N-parameter function takes N buffers as input, one row of
each input raster, and stores the resulting output row in the output
buffer.

What would the processing model be for v.mapcalc? What would the
"values" be? Complete vector maps?

--
Glynn Clements <glynn.clements@virgin.net>

On Tue, 2 Jul 2002 11:31:46 +0100
Glynn Clements <glynn.clements@virgin.net> wrote:

Radim Blazek wrote:

> As this is one of MANY spatial analysis, it should go to
> v.mapcalc (already started)

I've just had a look at that; it doesn't seem to provide much of a
basis for an actual tool along the lines of r.mapcalc.

The first thing to decide is the structure of the processing pipeline.
E.g. for r.mapcalc, the expression is evaluated for each row of the
result. An N-parameter function takes N buffers as input, one row of
each input raster, and stores the resulting output row in the output
buffer.

What would the processing model be for v.mapcalc? What would the
"values" be? Complete vector maps?

Vector maps are intrinsically more complex than raster maps. The
latter has just a point and a value, while the former has a set of
several possible types of objects, each with many possible
properties. The current design of v.mapcalc is essentially the same as
having a set of grass modules and using shell scripts, but implemented
in a program which knows of numbers, maps, points and point lists, and
an anonymous catch-all type `any'. There will be a minimal set of
builtin functions, but most will be dynamically loaded functions. This
way v.mapcalc doesn't need to define what can't be defined: any
attempt to deal with all possible combinations will yield an
explosion.

I still need to code a few things to make visible the complete idea,
when I plan to publish what I've done.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Tue, 2 Jul 2002 11:16:56 +0200
Radim Blazek <blazek@itc.it> wrote:

The basic framework is Vect_overlay(); lib/vector/Vlib/overlay.c
where should go all standard ovelay operators because may be used
also in other modules.

Would be a question of joining a list and unifying calls.

I started in dig_defines.h :
/* Overlay operators */
#define GV_ON_AND "AND" /* intersect */
#define GV_ON_OVERLAP "OVERLAP"

typedef enum {
    GV_O_AND,
    GV_O_OVERLAP,
} OVERLAY_OPERATOR;

Hm. These operators need to be recognized by the lexical scanner and
processed by the parser. It's usually bison who defines an integer
value for them.

Here we can add CONTAINS, as it is standard operator. There is
also Vect_overlay_str_to_operator ( char *str ); to convert
operator name to code; Then Vect_overlay() may be called with this code.

The way operators are handled is this:
- lexical scanner converts character sequence to integer value (token)
- parser recognizes sequence of tokens which is associated with an action
- the action performs the function call.

This means, at any point, there must be an association between the
operator and the function performing the operation: Any binary
operator will always yield a function call. This can be context bound,
i.e., the CONATINS operator may have a different meaning (calling a
different function) according to the operator type which might be a
map or not. It's more, I plan to allow creating arrays of operator to
function association, so a user could create her own context, loading
for instance a function which will always/never deal with attributes
in a certain way (like accessing SQL). Unless there is a fixed and
clearly defined set of standard operators, I really think we should
wait with this until having something working and a minimum set of
functions. To test these functions, we always can use the function
call syntax rather than the operator syntax. In any case, the operator
syntax basically is programmed, it's just a question of replicating
the things for CONTAINS as they are done with '+'.

Yes, that would be useful, at least types (BTW polygons are not used,
just areas

Hm. I thought there is a difference between polygone and area, as the
area is only a closed polygon. So which are actually the types of
objects which may be found in a vector map?

and "." (dot) may be used in map names).

I did think about this, but there are very few characters which are
not allowed in the filesystem and which are not already used else
(like / which is used for division). I chose the dot because I thought
it's not very common for map names and there is a way out: First one
could say "map.new.". Only the last dot is used, and an empty
bracket would indicate all. When the lexical scanner is working, there
will be a special object called string, which can be really anything
within double or quotes. So quoting "map.new" means that "new" is an
extension and not a selection, while just map.new would try to
interpret the "new" as a selector. How frequent do you think are dots
in mapnames?

Order numbers are not used, just categories like mymap.l(45)

I guess you ment mymap.p[3], as this is actually an orderal
number. mymap.l(45) was the example to select an object having
attribute 45.

If I run v.support, I thought to have remembered that it tells which
object caused any problems. How else could you identify an object? In
any case, I think it might be useful to leave such a syntax, at least
internally, as it might allow to process object by object in a loop.

Also SQL statement would be useful: mymap.l(flow > 50).

I think SQL statements should not enter the parser scope. This is far
too unlimited. We need to deal with them somehow else.

We already
discussed this and I don't know conclusion, this scheme
c = a.a(12) AND b.p
may be replaced by:
c = select(a, AREAS, 12) AND select(b,POINTS)
which is less effective but easier to develop and maintain.

My suggestion was to provide a function with an SQL statement (or a
piece of that) as a string. I didn't start implementing this, as I
wanted to wait for input from you.

I think the easiest way to deal with this is using a loop with
conditionals. I don't know how much SQL knowledge we can assume from a
grass user, but you can always form an SQL statement in such a way
that it returns only a boolean or at least a numerical value. This
would free us from having to deal with date strings or other more
complicated stuff.

In any case, it is obvious that SQL needs still much more thought.

> could be acceptible to choose any polygon or line, eventually followed
> by an object or attribute specification, which might also allow for
> other extensions like
>
> [2,4-6,9]
>
> to select the objects 2, 4, 5, 6, and 9.

Use Vect_str_to_cat_list();

Certainly a good idea. I'll have a look at that.

> The most complex syntax I can
> think here, would be several letters for each acceptible type of
> object, an expression to select a set of objects and another to select
> a set of attributes, all combined by logical OR.

I would expect AND.

No. There is no attribute which satisfies the condition "2 AND 4".

Should be:
p - point
l - line
b - boundary
c - centroid
a - area

Fine. This is what I can work with. But I didn't know about boundary
and centeroid. What's that?

Probably, but "." may not be used. It could be possible to put
type also to parentheses: map(pl;1-3)

I can't do this with parenths, as this looks like a function call. The
semicolon doesn't help here, as the parser must work with at most one
token of look ahead. But I could use braces, though I think this looks
ugly:

  map{p[1-3]}

Cannot say, but to implement more overlay operators seems to have
higher priority.

I agree. I think it's good to keep this things in mind, but I plan for
the next weekend, what I announced on sunday. When all that is done,
we can publish the whole thing, it'll be possible to integrate real
world functions, and I'll still be able to extend the syntax.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

Christoph Simon wrote:

This means, at any point, there must be an association between the
operator and the function performing the operation: Any binary
operator will always yield a function call. This can be context bound,
i.e., the CONATINS operator may have a different meaning (calling a
different function) according to the operator type which might be a
map or not. It's more, I plan to allow creating arrays of operator to
function association, so a user could create her own context, loading
for instance a function which will always/never deal with attributes
in a certain way (like accessing SQL). Unless there is a fixed and
clearly defined set of standard operators, I really think we should
wait with this until having something working and a minimum set of
functions. To test these functions, we always can use the function
call syntax rather than the operator syntax. In any case, the operator
syntax basically is programmed, it's just a question of replicating
the things for CONTAINS as they are done with '+'.

I would suggest forcing "add-on" functions to use (prefix) function
call syntax. Parsing infix expressions requires knowledge of operator
precedence and associativity. Allowing the actual parsing rules to be
modified dynamically could result in confusion.

> and "." (dot) may be used in map names).

I did think about this, but there are very few characters which are
not allowed in the filesystem and which are not already used else
(like / which is used for division). I chose the dot because I thought
it's not very common for map names and there is a way out: First one
could say "map.new.". Only the last dot is used, and an empty
bracket would indicate all. When the lexical scanner is working, there
will be a special object called string, which can be really anything
within double or quotes. So quoting "map.new" means that "new" is an
extension and not a selection, while just map.new would try to
interpret the "new" as a selector. How frequent do you think are dots
in mapnames?

Dots are very common, probably more common than underscore or dash
(minus). I suggest choosing a character which has meaning to the shell
(e.g. "$"), as people tend to avoid using map names which require
quoting.

Also, I suggest allowing the use of quotes for map names which contain
"special" characters, as is the case for r.mapcalc (e.g. test-map is
parsed as a subtraction, but "test-map" and 'test-map' are parsed as a
map name).

Actually, to minimise the potential for confusion, I suggest
maintaining as much compatibility with r.mapcalc (e.g. tokenising
rules, operator precedence) as is practical.

--
Glynn Clements <glynn.clements@virgin.net>

On Tuesday 02 July 2002 07:06 pm, Christoph Simon wrote:

> typedef enum {
> GV_O_AND,
> GV_O_OVERLAP,
> } OVERLAY_OPERATOR;

Hm. These operators need to be recognized by the lexical scanner and
processed by the parser. It's usually bison who defines an integer
value for them.

Then we need second set of constants for operators in v.mapcalc,
somehow mapped to GV_O_*. (?)

> Yes, that would be useful, at least types (BTW polygons are not used,
> just areas

Hm. I thought there is a difference between polygone and area, as the
area is only a closed polygon. So which are actually the types of
objects which may be found in a vector map?

Grass knows just areas, which may or may not contain islands.
Areas are formed by list of boundaries.

interpret the "new" as a selector. How frequent do you think are dots
in mapnames?

For example list of vectors in my first mapset, I looked into:
cabla.1 cabla.base cabla.base.lid cabla.base.nodes cabla.base.raw pok

> Order numbers are not used, just categories like mymap.l(45)

I guess you ment mymap.p[3], as this is actually an orderal
number. mymap.l(45) was the example to select an object having
attribute 45.

Yes, I ment, order number defined in are not used.
("Category" is used in grass for attribute attached directly to elements.)

If I run v.support, I thought to have remembered that it tells which
object caused any problems. How else could you identify an object? In
any case, I think it might be useful to leave such a syntax, at least
internally, as it might allow to process object by object in a loop.

Internaly, order numbers are used, but because, such number may change
during the life of an element, should not be used by users. To identify
any element, category number should be used in user interface.

> We already
> discussed this and I don't know conclusion, this scheme
> c = a.a(12) AND b.p
> may be replaced by:
> c = select(a, AREAS, 12) AND select(b,POINTS)
> which is less effective but easier to develop and maintain.

My suggestion was to provide a function with an SQL statement (or a
piece of that) as a string. I didn't start implementing this, as I
wanted to wait for input from you.

Why not use similar function for type/category?

I think the easiest way to deal with this is using a loop with
conditionals. I don't know how much SQL knowledge we can assume from a
grass user, but you can always form an SQL statement in such a way
that it returns only a boolean or at least a numerical value. This
would free us from having to deal with date strings or other more
complicated stuff.

We should not deal with this in v.mapcalc at all. As I mentioned in other
mail, some
Vect_list_elements (
    struct Map_info *Map,
    char *where, // sql where condition
    struct ilist *list_of_elements)
should create list of (order) numbers of elements, we want to process.

> I would expect AND.
No. There is no attribute which satisfies the condition "2 AND 4".

Of course, but I thought "a AND 2" (areas of cat 2)

Fine. This is what I can work with. But I didn't know about boundary
and centeroid. What's that?

Boundaries (edges) and centroids (area labels) define areas, see
"Programmer's manual: 6 Vector Maps"

Radim

On Wednesday 03 July 2002 06:34 am, Glynn Clements wrote:

Dots are very common, probably more common than underscore or dash
(minus). I suggest choosing a character which has meaning to the shell
(e.g. "$"), as people tend to avoid using map names which require
quoting.

Can scanner recognize more combinations separated by one char?:
(I think so because type is char, and cat is number)
map
map$lp
map$2,5
map$lp$2,5

Category ranges (25-35) should be also alloved, preferably without
necessity of quotes:
map$lp$3,7,25-30,120

Radim

On Wed, 3 Jul 2002 05:34:40 +0100
Glynn Clements <glynn.clements@virgin.net> wrote:

I would suggest forcing "add-on" functions to use (prefix) function
call syntax. Parsing infix expressions requires knowledge of operator
precedence and associativity. Allowing the actual parsing rules to be
modified dynamically could result in confusion.

As I use bison which has good support for it, precedence and
associativity aren't really a problem, but knowing the operators in
advance is. So I plan to follow your suggestion at least at the
beginning until a fixed set of operators is known. The fact is that
there is no prefix/infix notation which can't be expressed by a
function call syntax, and each prefix/infix syntax will internally
always lead to a function call.

Dots are very common, probably more common than underscore or dash
(minus). I suggest choosing a character which has meaning to the shell
(e.g. "$"), as people tend to avoid using map names which require
quoting.

I didn't like the dash etc., as it is context bound in the parser to
be distingished from an operator, but I do like the $.

Also, I suggest allowing the use of quotes for map names which contain
"special" characters, as is the case for r.mapcalc (e.g. test-map is
parsed as a subtraction, but "test-map" and 'test-map' are parsed as a
map name).

This is aready started, though not finished. I've a flex script which
will do this for properly paired single and double quotes with the
same meaning.

Actually, to minimise the potential for confusion, I suggest
maintaining as much compatibility with r.mapcalc (e.g. tokenising
rules, operator precedence) as is practical.

This was also my top guidline. Operator precedence might be completely
different, as I am sure that rmap + rmap performs something totally
different than vmap + vmap, but I'm not sure what vmap + vmap will
actually mean. Most probably this will be a dynamic concept, having
the possibility to change at runtime.

There are two things about r.mapcalc I didn't consider `practical' and
hence didn't follow it for v.mapcalc: (1) IIRC r.mapcalc does not
allow an unquoted name to include a number, because "10a" is actually
"10 a", while v.mapcalc will consider "10a" a name. But as quotes will
be supported, "10a" will have the same meaning in v.mapcalc (2) the
end-of-statement in r.mapcalc is the newline character, while
v.mapcalc will use the semicolon.

Neither of the two where clear and easy decisions, but since I made
them, I'm happy with them. As I plan to use readline for interactive
usage, single line statements are nice, because they are easy to
repeat and edit. In this case a newline would be just fine. But since
I was always considering more complex idioms like conditionals and
loops, the usage of scrips and include files, the newline character as
an end of statement can be really limiting. Last but not least, the
fact that psql (postgres) does use this character, shows that this
idea isn't completely absurd.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Wed, 3 Jul 2002 11:46:15 +0200
Radim Blazek <blazek@itc.it> wrote:

On Wednesday 03 July 2002 06:34 am, Glynn Clements wrote:
> Dots are very common, probably more common than underscore or dash
> (minus). I suggest choosing a character which has meaning to the shell
> (e.g. "$"), as people tend to avoid using map names which require
> quoting.

Can scanner recognize more combinations separated by one char?:
(I think so because type is char, and cat is number)
map
map$lp
map$2,5
map$lp$2,5

These suffices are not implemented yet, and the scanner I'm using in
the program is not more than a simulation of a flex scanner which I've
prepared so far as a skeleton only.

The logic will be this: If a token is found which is a `string', it
will be looked up in a symbol table still in the scanner, trying to
decide wether it is a variable name (of a certain type), a map, a
function name, etc. This is not always possible, as "a = ..." can not
tell the type of "a" if that hasn't been used before, and the user
might not have typed more at this time. If that string happens to be
the name of an existing vmap, which will always be known already in
the scanner, the extension will also be interpreted in the scanner,
added to a map-descriptor, and passed to the bison parser just as a
MAP token. The bison parser will not be directly aware of this list. I
know this is a somewhat strange approach, but it's the only way to
deal with lots of types.

Category ranges (25-35) should be also alloved, preferably without
necessity of quotes:
map$lp$3,7,25-30,120

As v.mapcalc will be more strict with spaces as token separators than
r.mapcalc, this is perfectly possible without quoting. But I tend not
to omit () and . Even if object indices aren't normally used in
grass, they can come in handy within v.mapcalc. For instance, we could
have a function which will create a list of indices selecting objects
by a more complex set of conditions, like several SQL statements
creating a union of indirect attribute dependencies. These objects
could be written to a new map, or used directly to perform another
operation. With very large lists, this could get a bit slow, as I'm
using linked lists; even a binary search on a linked list with
millions of items can take a fair amount of time. In such cases
writing a new intermediate map might prove more efficient.

Hm. I think another choice might be the colon. What about:

  mymap:l(25-30)

Does this look better?

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Wed, 3 Jul 2002 05:34:40 +0100
Glynn Clements <glynn.clements@virgin.net> wrote:

I would suggest forcing "add-on" functions to use (prefix) function
call syntax. Parsing infix expressions requires knowledge of operator
precedence and associativity. Allowing the actual parsing rules to be
modified dynamically could result in confusion.

Just one more comment: '+' plus should never be defined to be
something which has not a right associativity, nor something with a
higher precedence, say, than '*'. Then, no confusion will happen; it's
the natural thing everybody expects. What I'm thinking of, is to allow
just sets of functions associated to the operators, somewhat in the
style of operator overloading of C++. As this would be done at
runtime, the situation of non algebraic operators like CONTAINS need
to have a clear definition to be consequent in this point.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

On Wed, 3 Jul 2002 11:36:04 +0200
Radim Blazek <blazek@itc.it> wrote:

> Hm. These operators need to be recognized by the lexical scanner and
> processed by the parser. It's usually bison who defines an integer
> value for them.

Then we need second set of constants for operators in v.mapcalc,
somehow mapped to GV_O_*. (?)

Don't worry about them. In the end, it will always be a function
call. The transformation of what the user types to the functions
actually called is exactly what the parser is for. Your function will
not have to know about it.

Grass knows just areas, which may or may not contain islands.
Areas are formed by list of boundaries.

And boundaries are polygons which can be open or closed?

> interpret the "new" as a selector. How frequent do you think are dots
> in mapnames?

For example list of vectors in my first mapset, I looked into:
cabla.1 cabla.base cabla.base.lid cabla.base.nodes cabla.base.raw
pok

OK. For now, the useful choices seem to be $ or :, though I have a
slight preference for the colon.

Internaly, order numbers are used, but because, such number may change
during the life of an element, should not be used by users. To identify
any element, category number should be used in user interface.

Hm. When a user mentions a map in v.mapcalc, this map is not supposed
to be changed, but a new map should be created, right? If not, this
represents a problem, for instance, if I create a list of objects to
work only on that list from there on. If it is changed before or after
a v.mapcalc session, this wouldn't be a problem. Under which
circunstances this orderal number can change?

> My suggestion was to provide a function with an SQL statement (or a
> piece of that) as a string. I didn't start implementing this, as I
> wanted to wait for input from you.

Why not use similar function for type/category?

I also reached this point, but I will need to know more details. Is it
enough to use a character string for any complete SQL statement, or do
I need to split it up into FROM and WHERE clauses?

We should not deal with this in v.mapcalc at all. As I mentioned in
other mail, some
Vect_list_elements (
    struct Map_info *Map,
    char *where, // sql where condition
    struct ilist *list_of_elements)
should create list of (order) numbers of elements, we want to process.

My problem here is, that I suspect that the SQL people are still at
the very beginning ant that they are likely to change mind in this
point. What if one map is associated with more than one table/column
in the database? what, if certain attributes need to use SQL-computed
values which need to go into the FROM clause? The global setting of
table and column is a very sever restriction.

> > I would expect AND.
> No. There is no attribute which satisfies the condition "2 AND 4".

Of course, but I thought "a AND 2" (areas of cat 2)

Right. We need OR if it's the same type and AND if they are different
types.

Boundaries (edges) and centroids (area labels) define areas, see
"Programmer's manual: 6 Vector Maps"

Yep. Lot's of homework I haven't done yet.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

Christoph Simon wrote:

> I would suggest forcing "add-on" functions to use (prefix) function
> call syntax. Parsing infix expressions requires knowledge of operator
> precedence and associativity. Allowing the actual parsing rules to be
> modified dynamically could result in confusion.

As I use bison which has good support for it, precedence and
associativity aren't really a problem, but knowing the operators in
advance is.

The issue isn't in implementation; that would be simple enough. The
issue is to avoid confusing the user. E.g.

  X * Y <op> Z

is either:

  (X * Y) <op> Z
or:
  X * (Y <op> Z)

depending upon the precedence of <op>. If you have lots of
non-standard[1] infix operators of varying precedence, users will
start by making mistakes, and end up bracketing everything just to be
safe.

[1] Anything that C doesn't have is definitely non-standard. And, as
GRASS' users often aren't programmers, anything other than +-*/ is
dubious.

> Dots are very common, probably more common than underscore or dash
> (minus). I suggest choosing a character which has meaning to the shell
> (e.g. "$"), as people tend to avoid using map names which require
> quoting.

I didn't like the dash etc., as it is context bound in the parser to
be distingished from an operator, but I do like the $.

Actually, your suggestion of the colon might be better. It's unlikely
to be common in map names; furthermore, we may eventually wish to
prohibit colons in map names, for compatibility with Windows.

Currently, we allow[1] all 7-bit characters other than control codes,
delete, space, slash, and single and double quotes, with the
restriction that map names cannot begin with a dot.

[1] In the sense of G_legal_name(). I know for a fact that there are
names which are legal according to G_legal_name() but which will cause
some programs to fail (primarily, use of shell metacharacters will
break code which calls system() or popen() without sufficient
quoting).

> Actually, to minimise the potential for confusion, I suggest
> maintaining as much compatibility with r.mapcalc (e.g. tokenising
> rules, operator precedence) as is practical.

This was also my top guidline. Operator precedence might be completely
different, as I am sure that rmap + rmap performs something totally
different than vmap + vmap, but I'm not sure what vmap + vmap will
actually mean. Most probably this will be a dynamic concept, having
the possibility to change at runtime.

As you implied in your other message, "+", "*" etc should at least
have the standard precedence and associativity regardless of their
semantics.

There are two things about r.mapcalc I didn't consider `practical' and
hence didn't follow it for v.mapcalc: (1) IIRC r.mapcalc does not
allow an unquoted name to include a number, because "10a" is actually
"10 a", while v.mapcalc will consider "10a" a name. But as quotes will
be supported, "10a" will have the same meaning in v.mapcalc (2) the
end-of-statement in r.mapcalc is the newline character, while
v.mapcalc will use the semicolon.

1. It's arguable that r.mapcalc should be changed here; the token
sequence ["10", "a"] doesn't have a parse. OTOH, numeric literals are
also valid (and not uncommon) map names, but clearly they have to
tokenise as numeric literals.

2. The new r.mapcalc (src/raster/r.mapcalc3) allows either newline or
semicolon. Ignore src/raster/r.mapcalc; that is no longer used.

--
Glynn Clements <glynn.clements@virgin.net>

On Wed, 3 Jul 2002 19:44:47 +0100
Glynn Clements <glynn.clements@virgin.net> wrote:

The issue isn't in implementation; that would be simple enough. The
issue is to avoid confusing the user. E.g.

  X * Y <op> Z

is either:

  (X * Y) <op> Z
or:
  X * (Y <op> Z)

depending upon the precedence of <op>. If you have lots of
non-standard[1] infix operators of varying precedence, users will
start by making mistakes, and end up bracketing everything just to be
safe.

[1] Anything that C doesn't have is definitely non-standard. And, as
GRASS' users often aren't programmers, anything other than +-*/ is
dubious.

I agree; the function call syntax is that much cleaner, and this is
what I prefer, though I'm trying to adapt to common usage. My
conversations with Radim seemd to show that he's preferring the
operator style. The reason why I am trying to make this as flexible as
possible is exactly because I don't know the right answer. Maybe
later, when realworld examples are available, it's possible to define
something. In any case, precedence is a dubious business, as it's an
implicit not immediately visible rule. +-*/ is so well known that
nobody should have problems with it, but a modula operator having a
higher or lower precedence that the division is much less known and
yet not restricted to the C-language.

Actually, your suggestion of the colon might be better. It's unlikely
to be common in map names; furthermore, we may eventually wish to
prohibit colons in map names, for compatibility with Windows.

Also in Unix, colons as part of a directory name isn't really
legal. Don't know if Mac might have troubles with that.

> > Actually, to minimise the potential for confusion, I suggest
> > maintaining as much compatibility with r.mapcalc (e.g. tokenising
> > rules, operator precedence) as is practical.
>
> This was also my top guidline. Operator precedence might be completely
> different, as I am sure that rmap + rmap performs something totally
> different than vmap + vmap, but I'm not sure what vmap + vmap will
> actually mean. Most probably this will be a dynamic concept, having
> the possibility to change at runtime.

As you implied in your other message, "+", "*" etc should at least
have the standard precedence and associativity regardless of their
semantics.

Right. Though it's possible to do it, it's most probably not a good
idea to touch those.

1. It's arguable that r.mapcalc should be changed here; the token
sequence ["10", "a"] doesn't have a parse. OTOH, numeric literals are
also valid (and not uncommon) map names, but clearly they have to
tokenise as numeric literals.

The scanner doesn't know if the sequence of found tokens will make
sense to the parser, so the difference is wether the scanner sends one
token type name or string with value "10a" to the parser, or two
tokens, one number "10" and one name or string with value "a". If "+"
can be part of an unquoted name, any token separator will be
necessary. My rationale was that it's easier to insert a few more
spaces than to qote such names. OTOH, I don't want to suggest that
r.mapcalc must change the same way. It has been around for many years
and people got used to it. Changing that might have some bad side
effects.

2. The new r.mapcalc (src/raster/r.mapcalc3) allows either newline or
semicolon. Ignore src/raster/r.mapcalc; that is no longer used.

I'm not really happy with the idea the newline is necessarily an
end-of-statement mark. For now it's not. It'll be probably best to
delay this until realworld usage can be tested.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

Christoph Simon wrote:

> 1. It's arguable that r.mapcalc should be changed here; the token
> sequence ["10", "a"] doesn't have a parse. OTOH, numeric literals are
> also valid (and not uncommon) map names, but clearly they have to
> tokenise as numeric literals.

The scanner doesn't know if the sequence of found tokens will make
sense to the parser, so the difference is wether the scanner sends one
token type name or string with value "10a" to the parser, or two
tokens, one number "10" and one name or string with value "a". If "+"
can be part of an unquoted name, any token separator will be
necessary. My rationale was that it's easier to insert a few more
spaces than to qote such names. OTOH, I don't want to suggest that
r.mapcalc must change the same way. It has been around for many years
and people got used to it. Changing that might have some bad side
effects.

Well, the trend in language design seems to be moving away from
requiring whitespace. More generally, the trend in language design
seems to be ever-closer compatibility with C; it's open to debate as
to whether this is good or bad, but it's getting pretty clear that
taking an opposite path in any particular instance is going against
the flow.

> 2. The new r.mapcalc (src/raster/r.mapcalc3) allows either newline or
> semicolon. Ignore src/raster/r.mapcalc; that is no longer used.

I'm not really happy with the idea the newline is necessarily an
end-of-statement mark. For now it's not. It'll be probably best to
delay this until realworld usage can be tested.

The main factor is the complexity of a typical statement. For
r.mapcalc, the most common usage is a single expression with a few
operators, included on the command line. Beyond that, most expressions
fit on one line, so it's easier to require a backslash for multi-line
statements than to require a semicolon per statement.

OTOH, it sounds as if v.mapcalc is aimed more at substantial scripts
than simple expressions; in which case, multi-line statements may be
far more common. If the overall syntax differs significantly from
r.mapcalc, it may help to pick a different name, to avoid implying a
similarity which doesn't really exist.

--
Glynn Clements <glynn.clements@virgin.net>

On Wed, 3 Jul 2002 20:45:02 +0100
Glynn Clements <glynn.clements@virgin.net> wrote:

Well, the trend in language design seems to be moving away from
requiring whitespace. More generally, the trend in language design
seems to be ever-closer compatibility with C; it's open to debate as
to whether this is good or bad, but it's getting pretty clear that
taking an opposite path in any particular instance is going against
the flow.

Sorry, didn't get your point here. "10a" (unquoted) isn't a legal
C-language token. OTOH, I don't restrict anything which isn't
restricted in C, all the other way round. Using quotes, any name can
get any characters. v.mapcalc currently even is more liberal than
grass, as it also allows for 8-bit characters. In any case, using
quotes, anything is possible.

The main factor is the complexity of a typical statement. For
r.mapcalc, the most common usage is a single expression with a few
operators, included on the command line. Beyond that, most expressions
fit on one line, so it's easier to require a backslash for multi-line
statements than to require a semicolon per statement.

OTOH, it sounds as if v.mapcalc is aimed more at substantial scripts
than simple expressions; in which case, multi-line statements may be
far more common. If the overall syntax differs significantly from
r.mapcalc, it may help to pick a different name, to avoid implying a
similarity which doesn't really exist.

Everybody seems to agree that GRASS isn't very complete when talking
about vector functionality. This isn't a problem, until a user needs
to solve a particular task. And then, the straight forward answer is
to write a grass v.* module which will do the job. This seems to grow
bigger and bigger, probably more than in the case of raster maps.

I myself, a GRASS beginner, reached this point more than once, so I
wondered why there isn't a program which would allow me to adapt to
any particular situation, much in the style of r.mapcalc. So the very
first idea was to use v.mapcalc that same way r.mapcalc is used, with
short, single action statements. This was and remains the first aim of
v.mapcalc. I'm no r.mapcalc expert, but I did see quite long
`onliners' in mapcalc tutorials.

While trying to describe a v.mapcalc program in concrete terms, it
seemed that there is a certain need to justify it's existence, as many
of the hypothetical examples could be solved using already existing
modules, or modules which had already been proposed. I don't think
it's a problem to have more than one way to solve a particular task,
but vector complexity itself seems to push me to write something which
appears more an interpreter than a calculator: allowing to deal with
non-map entities like pointlists (or maybe even SQL entities) on one
hand, and differenciated processing of object lists proceeding from
one or more maps and yielding a new map, using programming language
like idioms like loops and conditionals. Of cource, when scripting is
available, a comment character will be clearly necessary, yet another
step r.mapcalc didn't have to deal with.

This second aim isn't really so different from the first, so I still
expect those scripts to be little more than a few lines, but as
flexible as needed to deal with the most special vector tasks. This,
at least, is the aim. I do not pretend to be able to predict how it's
actually going to be used.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.

Christoph Simon wrote:

> Well, the trend in language design seems to be moving away from
> requiring whitespace. More generally, the trend in language design
> seems to be ever-closer compatibility with C; it's open to debate as
> to whether this is good or bad, but it's getting pretty clear that
> taking an opposite path in any particular instance is going against
> the flow.

Sorry, didn't get your point here. "10a" (unquoted) isn't a legal
C-language token. OTOH, I don't restrict anything which isn't
restricted in C, all the other way round. Using quotes, any name can
get any characters. v.mapcalc currently even is more liberal than
grass, as it also allows for 8-bit characters. In any case, using
quotes, anything is possible.

My point was primarily that, so far as possible, whitespace should be
optional, e.g. "foo - bar" and "foo-bar" should be equivalent, so
requiring non-alphanumeric map names to be quoted is preferable to
requiring whitespace around operators.

Most of the common programming languages restrict "identifiers" to a
set of characters which is disjoint from the set of characters used
for infix operators, so this ambiguity doesn't arise. Consequently,
requiring whitespace around operators is likely to contradict users'
experience and habits.

--
Glynn Clements <glynn.clements@virgin.net>

On Wednesday 03 July 2002 02:09 pm, Christoph Simon wrote:

And boundaries are polygons which can be open or closed?

Boundaries are lines (polylines). One isolated area may be formed
by one closed boundary, though it is not typical case.

OK. For now, the useful choices seem to be $ or :, though I have a
slight preference for the colon.

Colon is nicer.

> Internaly, order numbers are used, but because, such number may change
> during the life of an element, should not be used by users. To identify
> any element, category number should be used in user interface.

Hm. When a user mentions a map in v.mapcalc, this map is not supposed
to be changed, but a new map should be created, right? If not, this
represents a problem, for instance, if I create a list of objects to
work only on that list from there on. If it is changed before or after
a v.mapcalc session, this wouldn't be a problem. Under which
circunstances this orderal number can change?

Yes, if the list is created for old map, it may not change until
the map is updated and topo is rebuilt. If the list is created
by v.mapcalc, it is OK. What I want to avoid is to enter order
numbers by user:
lista = select_by_box( mapa, (0,0,5,5) ) - is ok
listb = list( 1,5,8 ) - could cause some problems, for example:
user found somehow list 1,5,8, then rebuilt topology, and these
numbers changed to 1,2,3.
OTOH, if user knows about this, it is powerful tool.

> Why not use similar function for type/category?

I also reached this point, but I will need to know more details. Is it
enough to use a character string for any complete SQL statement, or do
I need to split it up into FROM and WHERE clauses?

We need WHERE, because FROM is defined in DB file for each map/field.

> We should not deal with this in v.mapcalc at all. As I mentioned in
> other mail, some
> Vect_list_elements (
> struct Map_info *Map,
> char *where, // sql where condition
> struct ilist *list_of_elements)
> should create list of (order) numbers of elements, we want to process.

My problem here is, that I suspect that the SQL people are still at
the very beginning ant that they are likely to change mind in this
point.

Who are "SQL people"?

What if one map is associated with more than one table/column
in the database?

Defined by field + DB file. (I think, that there is realy timem, to
test grass51.)

what, if certain attributes need to use SQL-computed
values which need to go into the FROM clause? The global setting of
table and column is a very sever restriction.

OK, may be it is, but if you allow optional TABLE/KEY, you must support
these options in each module. In most cases, one map is associated with
on table and user does not even want to know, how it is done.
I think, that in v.mapcalc is possible to create list of categories
by sql select statement and then use this list to select features:
mapb = extract_by_cat ( mapa, sql_select ( "select id from x where f > 1") );
but in most cases:
mapb = extract( mapa, "f > 1");

Radim

On Thu, 4 Jul 2002 10:07:07 +0200
Radim Blazek <blazek@itc.it> wrote:

Yes, if the list is created for old map, it may not change until
the map is updated and topo is rebuilt. If the list is created
by v.mapcalc, it is OK. What I want to avoid is to enter order
numbers by user:
lista = select_by_box( mapa, (0,0,5,5) ) - is ok
listb = list( 1,5,8 ) - could cause some problems, for example:
user found somehow list 1,5,8, then rebuilt topology, and these
numbers changed to 1,2,3.
OTOH, if user knows about this, it is powerful tool.

You said, that orderal numbers are not visible to the user, so the
user doesn't know them and there should be no danger of abuse.

> > Why not use similar function for type/category?
>
> I also reached this point, but I will need to know more details. Is it
> enough to use a character string for any complete SQL statement, or do
> I need to split it up into FROM and WHERE clauses?

We need WHERE, because FROM is defined in DB file for each map/field.

This is what I understood, but I think it's a very limiting approach,
losing lots of SQL's power. My guess is, that this will change in the
future.

Who are "SQL people"?

Don't know. Those who wrote the v.*.pg functions.

> What if one map is associated with more than one table/column
> in the database?

Defined by field + DB file. (I think, that there is realy timem, to
test grass51.)

Yes. it is time :slight_smile: But do you suggest that you can specify more than
one field and more than one DB file (== table)? What about aggregate
functions?

OK, may be it is, but if you allow optional TABLE/KEY, you must support
these options in each module. In most cases, one map is associated with
on table and user does not even want to know, how it is done.

If this is the most frequent case, it should stay as it is. But I
think sort of a backdoor needs to be opened.

I think, that in v.mapcalc is possible to create list of categories
by sql select statement and then use this list to select features:
mapb = extract_by_cat ( mapa, sql_select ( "select id from x where f >
1") ); but in most cases:
mapb = extract( mapa, "f > 1");

This might be the backdoor.

--
Christoph Simon
ciccio@kiosknet.com.br
---
^X^C
q
quit
:q
^C
end
x
exit
ZZ
^D
?
help
.