[GRASS5] PATCH: sql expression parser

Hi all.

Earlier, I queried about sql expressions in where clauses. Glynn proposed a
fix for that using additional rules to cope with operator precedence. I attach
a patch that uses the precedence directives available to bison. This allows
for eg:

select cat from blah where (col1/(col2+col3)-col4)=1

However, I'm unsure about %left and ilk directives in older yaccs; could
anybody confirm this works in non-GNU yaccs? If so, please test this patch, it
is a bugfix actually.

Daniel.

-- Daniel Calvelo Aros

(attachments)

aritwhereexp.patch (1.58 KB)

Hello Daniel

On Fri, 10 Jun 2005, Daniel Calvelo Aros wrote:

However, I'm unsure about %left and ilk directives in older yaccs; could
anybody confirm this works in non-GNU yaccs? If so, please test this patch, it
is a bugfix actually.

It didn't work anyway with Solaris lex and yacc, so probably not a problem. Below is the error (nothing to do with your changes)---a lot of
it seems to be mis-placed comments but I'm not sure what proper lex syntax for this is.

Paul

/usr/ccs/bin/yacc -d -v yac.y
"yac.y", line 68: warning: redeclaration of precedence of TABLE.
"yac.y", line 69: warning: redeclaration of precedence of TABLE.

conflicts: 10 shift/reduce
/usr/ccs/bin/lex lex.l
"lex.l":line 75: Error: executable statements should occur right after %%
"lex.l":line 83: Error: executable statements should occur right after %%
"lex.l":line 101: Error: syntax error
"lex.l":line 100: Error: Illegal rule
"lex.l":line 115: Error: executable statements should occur right after %%
"lex.l":line 116: Error: executable statements should occur right after %%
"lex.l":line 117: Error: executable statements should occur right after %%
"lex.l":line 124: Error: executable statements should occur right after %%
"lex.l":line 125: Error: executable statements should occur right after %%
"lex.l":line 126: Error: executable statements should occur right after %%
"lex.l":line 134: Error: executable statements should occur right after %%
"lex.l":line 135: Error: executable statements should occur right after %%
"lex.l":line 137: Error: executable statements should occur right after %%
"lex.l":line 148: Error: executable statements should occur right after %%
"lex.l":line 149: Error: executable statements should occur right after %%
"lex.l":line 151: Error: executable statements should occur right after %%
"lex.l":line 158: Error: Non-terminated string or character constant
342/1200 nodes(%e), 0/2500 positions(%p), 1/500 (%n), 0 transitions,
0/10000 packed char classes(%k), 0/2000 packed transitions(%a), 0/12000 output slots(%o)
make: *** [lex.yy.c] Error 1

From: Paul Kelly <paul-grass@stjohnspoint.co.uk>
Sent: Fri, 10 Jun 2005 15:50:42 +0100 (BST)

Hello Daniel

On Fri, 10 Jun 2005, Daniel Calvelo Aros wrote:

> However, I'm unsure about %left and ilk directives in older yaccs; could
> anybody confirm this works in non-GNU yaccs? If so, please test this patch, it
> is a bugfix actually.

It didn't work anyway with Solaris lex and yacc, so probably not a
problem. Below is the error (nothing to do with your changes)---a
lot of it seems to be mis-placed comments but I'm not sure what
proper lex syntax for this is.

[snip]

Ouch. Are you resorting to using bison under solaris anyway? Might including
the generated files (lex.yy.c and y.tab.c) in the "source" distribution be a
solution?

Daniel.

On Fri, 10 Jun 2005, Daniel Calvelo Aros wrote:

From: Paul Kelly <paul-grass@stjohnspoint.co.uk>

It didn't work anyway with Solaris lex and yacc, so probably not a
problem. Below is the error (nothing to do with your changes)---a
lot of it seems to be mis-placed comments but I'm not sure what
proper lex syntax for this is.

[snip]

Ouch. Are you resorting to using bison under solaris anyway?

No---to get it to compile I deleted all the comments from lex.l; also I had to delete a further few misplaced spaces and tabs before it would run properly.

Might including
the generated files (lex.yy.c and y.tab.c) in the "source" distribution be a
solution?

That sounds like quite a good idea... I've noticed in some free software projects they don't even include the generated configure script in CVS and the user has to generate it themselves with autoconf. In GRASS the developers always generate the configure script; maybe "pre-generating" the lex and yacc files and storing in CVS would be a worthwhile idea.

Paul

Daniel Calvelo Aros wrote:

Earlier, I queried about sql expressions in where clauses. Glynn proposed a
fix for that using additional rules to cope with operator precedence. I attach
a patch that uses the precedence directives available to bison. This allows
for eg:

select cat from blah where (col1/(col2+col3)-col4)=1

However, I'm unsure about %left and ilk directives in older yaccs; could
anybody confirm this works in non-GNU yaccs? If so, please test this patch, it
is a bugfix actually.

Can you try the patch I have sent in response to your previous
message? It explicitly refactors the grammar to use multiple "levels"
of expression rather than relying upon yacc to handle of precedence
and fixity.

--
Glynn Clements <glynn@gclements.plus.com>

Paul Kelly wrote:

> However, I'm unsure about %left and ilk directives in older yaccs; could
> anybody confirm this works in non-GNU yaccs? If so, please test this patch, it
> is a bugfix actually.

It didn't work anyway with Solaris lex and yacc, so probably not a
problem. Below is the error (nothing to do with your changes)---a lot of
it seems to be mis-placed comments but I'm not sure what proper lex syntax
for this is.

Paul

/usr/ccs/bin/yacc -d -v yac.y
"yac.y", line 68: warning: redeclaration of precedence of TABLE.
"yac.y", line 69: warning: redeclaration of precedence of TABLE.

Just delete the second and third occurrences of TABLE.

conflicts: 10 shift/reduce
/usr/ccs/bin/lex lex.l
"lex.l":line 75: Error: executable statements should occur right after %%
"lex.l":line 83: Error: executable statements should occur right after %%

Delete the comments.

--
Glynn Clements <glynn@gclements.plus.com>

Paul Kelly wrote:

> Might including
> the generated files (lex.yy.c and y.tab.c) in the "source" distribution be a
> solution?

That sounds like quite a good idea... I've noticed in some free software
projects they don't even include the generated configure script in CVS and
the user has to generate it themselves with autoconf. In GRASS the
developers always generate the configure script; maybe "pre-generating"
the lex and yacc files and storing in CVS would be a worthwhile idea.

That's a bad idea, IMHO.

Either developers will forget to commit them when lex.l or yac.y
change, or they will end up being committed when lex.l or yac.y
haven't changed (due to different lex/yacc programs generating subtly
different code). Also, CVS doesn't deal with timestamps correctly, so
there can be version skew issues.

It would be better to just ensure that all lex/yacc files can be
processed by any version of these tools. It shouldn't be that hard; we
aren't doing anything particularly complex with lex/yacc.

It isn't an issue for the configure script because this isn't
re-generated automatically as part of the normal build process.

--
Glynn Clements <glynn@gclements.plus.com>

There is a /huge/ difference between configure scripts and generated lex
and yacc files. The former are /likely/ to be different across a
variety of build environments (though it's true that a single configure
script can also likely succeed in being useful on most systems). In the
latter case it is almost always the case that generated files
are /identical/ in form and definitely in function across all systems.

M

On Sat, 2005-06-11 at 01:46 +0100, Paul Kelly wrote:

On Fri, 10 Jun 2005, Daniel Calvelo Aros wrote:

> From: Paul Kelly <paul-grass@stjohnspoint.co.uk>
>> It didn't work anyway with Solaris lex and yacc, so probably not a
>> problem. Below is the error (nothing to do with your changes)---a
>> lot of it seems to be mis-placed comments but I'm not sure what
>> proper lex syntax for this is.
> [snip]
>
> Ouch. Are you resorting to using bison under solaris anyway?

No---to get it to compile I deleted all the comments from lex.l; also I
had to delete a further few misplaced spaces and tabs before it would run
properly.

> Might including
> the generated files (lex.yy.c and y.tab.c) in the "source" distribution be a
> solution?

That sounds like quite a good idea... I've noticed in some free software
projects they don't even include the generated configure script in CVS and
the user has to generate it themselves with autoconf. In GRASS the
developers always generate the configure script; maybe "pre-generating"
the lex and yacc files and storing in CVS would be a worthwhile idea.

Paul

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

From: Glynn Clements <glynn@gclements.plus.com>
Sent: Sat, 11 Jun 2005 05:18:25 +0100

Can you try the patch I have sent in response to your previous
message? It explicitly refactors the grammar to use multiple "levels"
of expression rather than relying upon yacc to handle of precedence
and fixity.

It compiles and works under MacOSX too. To test, I use the
lib/db/sqlp/test/sqlptest command.

I like your fixing the grammar better. However, it won't deal with AND/OR
precedence. I guess the same kind of grammar fixing for y_comparison is
required, isn't it?

Thanks.

Daniel.

[me]

Might including
the generated files (lex.yy.c and y.tab.c) in the "source" distribution be a
solution?

[Paul]

That sounds like quite a good idea... I've noticed in some free software
projects they don't even include the generated configure script in CVS and
the user has to generate it themselves with autoconf. In GRASS the
developers always generate the configure script; maybe "pre-generating"
the lex and yacc files and storing in CVS would be a worthwhile idea.

[Glynn]

That's a bad idea, IMHO.

Either developers will forget to commit them when lex.l or yac.y
change, or they will end up being committed when lex.l or yac.y
haven't changed (due to different lex/yacc programs generating subtly
different code). Also, CVS doesn't deal with timestamps correctly, so
there can be version skew issues.

Well, on the one hand, the files hadn't been touched since circa 2001.

It would be better to just ensure that all lex/yacc files can be
processed by any version of these tools. It shouldn't be that hard;
we aren't doing anything particularly complex with lex/yacc.

On the other hand, having a "portable" input feels more correct. What needs to
be done in order to fix that? Linux, MacOSX (because of GNU, I believe) work
Ok. We spotted problems with solaris, and so far they are minor. Whitespace
is no issue. Let's just make solaris happy. Comments are more important, IMHO.
Paul, could you try moving the comments within the "C" part of the code?

Daniel.

On Sat, 11 Jun 2005, Daniel Calvelo Aros wrote:

Paul, could you try moving the comments within the "C" part of the code?

Surrounding all the comments that weren't within a section of C-code like this:
%{
    /*******************
     ** Comments Here **
     *******************/
%}
worked and got lex to generate lex.yy.c OK. Although I'd imagine there might be a better way of doing it and as Glynn suggested do the comments really need to be there?

When we go to compile lex.yy.c however there are further errors:

/usr/ccs/bin/lex lex.l
494/1200 nodes(%e), 1064/2500 positions(%p), 145/500 (%n), 8347 transitions,
448/10000 packed char classes(%k), 1318/2000 packed transitions(%a), 1655/12000 output slots(%o)
gcc -I/home/pkelly/grass/grass6/include -I/home/pkelly/grass/grass6/dist.sparc-sun-solaris2.8/include/grass -O2 -Wall -Wconversion -Wno-implicit-int -fPIC -DPACKAGE=\""grasslibs"\" -DPACKAGE=\""grasslibs"\" -I/home/pkelly/grass/grass6/include -I/home/pkelly/grass/grass6/dist.sparc-sun-solaris2.8/include/grass \
         -o OBJ.sparc-sun-solaris2.8/lex.yy.o -c lex.yy.c
lex.l:40: warning: return type defaults to `int'
lex.l: In function `yylex':
lex.l:41: warning: unused variable `yyprevious'
lex.l:54: warning: label `yyfussy' defined but not used
lex.l: In function `my_yyinput':
lex.l:238: warning: passing arg 3 of `memcpy' as unsigned due to prototype
lex.l: In function `yyerror':
lex.l:256: warning: implicit declaration of function `yy_flush_buffer'
lex.l:256: error: `YY_CURRENT_BUFFER' undeclared (first use in this function)
lex.l:256: error: (Each undeclared identifier is reported only once
lex.l:256: error: for each function it appears in.)
lex.l: At top level:
lex.l:716: warning: missing braces around initializer
lex.l:716: warning: (near initialization for `yycrank[0]')
lex.l:1132: warning: missing braces around initializer
lex.l:1132: warning: (near initialization for `yysvec[0]')
lex.l:1331: warning: ignoring #pragma ident
make: *** [OBJ.sparc-sun-solaris2.8/lex.yy.o] Error 1

To me this shows that the original lex, written and released before ANSI
C was standardized, generates code that is now, 30 years later, so
anachronistic that it's no longer a reasonable reference point. flex
and byacc (or bison) forever! (Forever because, as open source, they
are still maintained.)

M

On Sun, 2005-06-12 at 18:27, Paul Kelly wrote:

On Sat, 11 Jun 2005, Daniel Calvelo Aros wrote:

> Paul, could you try moving the comments within the "C" part of the code?

Surrounding all the comments that weren't within a section of C-code like
this:
%{
    /*******************
     ** Comments Here **
     *******************/
%}
worked and got lex to generate lex.yy.c OK. Although I'd imagine there
might be a better way of doing it and as Glynn suggested do the comments
really need to be there?

When we go to compile lex.yy.c however there are further errors:

/usr/ccs/bin/lex lex.l
494/1200 nodes(%e), 1064/2500 positions(%p), 145/500 (%n), 8347
transitions,
448/10000 packed char classes(%k), 1318/2000 packed transitions(%a),
1655/12000 output slots(%o)
gcc -I/home/pkelly/grass/grass6/include
-I/home/pkelly/grass/grass6/dist.sparc-sun-solaris2.8/include/grass -O2
-Wall -Wconversion -Wno-implicit-int -fPIC -DPACKAGE=\""grasslibs"\"
-DPACKAGE=\""grasslibs"\" -I/home/pkelly/grass/grass6/include
-I/home/pkelly/grass/grass6/dist.sparc-sun-solaris2.8/include/grass \
         -o OBJ.sparc-sun-solaris2.8/lex.yy.o -c lex.yy.c
lex.l:40: warning: return type defaults to `int'
lex.l: In function `yylex':
lex.l:41: warning: unused variable `yyprevious'
lex.l:54: warning: label `yyfussy' defined but not used
lex.l: In function `my_yyinput':
lex.l:238: warning: passing arg 3 of `memcpy' as unsigned due to prototype
lex.l: In function `yyerror':
lex.l:256: warning: implicit declaration of function `yy_flush_buffer'
lex.l:256: error: `YY_CURRENT_BUFFER' undeclared (first use in this
function)
lex.l:256: error: (Each undeclared identifier is reported only once
lex.l:256: error: for each function it appears in.)
lex.l: At top level:
lex.l:716: warning: missing braces around initializer
lex.l:716: warning: (near initialization for `yycrank[0]')
lex.l:1132: warning: missing braces around initializer
lex.l:1132: warning: (near initialization for `yysvec[0]')
lex.l:1331: warning: ignoring #pragma ident
make: *** [OBJ.sparc-sun-solaris2.8/lex.yy.o] Error 1

_______________________________________________
grass5 mailing list
grass5@grass.itc.it
http://grass.itc.it/mailman/listinfo/grass5

Daniel Calvelo Aros wrote:

[Glynn]
> That's a bad idea, IMHO.
>
> Either developers will forget to commit them when lex.l or yac.y
> change, or they will end up being committed when lex.l or yac.y
> haven't changed (due to different lex/yacc programs generating subtly
> different code). Also, CVS doesn't deal with timestamps correctly, so
> there can be version skew issues.

Well, on the one hand, the files hadn't been touched since circa 2001.

"cvs log" says otherwise.

They were created on 2001/05/10, and have been each been modified
around 20 times between then and when I last modified them on
2004/09/09 (to add IS NULL/NOT NULL support).

Furthermore, even if lex.l/yac.y were never changed, the generated C
files (lex.yy.c, y.tab.c, y.tab.h) will typically differ between
developers, so "cvs commit lib/db/sqlp" would typically end up
committing the local copy.

> It would be better to just ensure that all lex/yacc files can be
> processed by any version of these tools. It shouldn't be that hard;
> we aren't doing anything particularly complex with lex/yacc.

On the other hand, having a "portable" input feels more correct. What needs to
be done in order to fix that? Linux, MacOSX (because of GNU, I believe) work
Ok. We spotted problems with solaris, and so far they are minor. Whitespace
is no issue. Let's just make solaris happy. Comments are more important, IMHO.
Paul, could you try moving the comments within the "C" part of the code?

Neither of those files is complex enough to need comments.

Try putting comments inside %{ and %}, but if that doesn't work, just
get rid of them.

--
Glynn Clements <glynn@gclements.plus.com>

Michael Tiemann wrote:

There is a /huge/ difference between configure scripts and generated lex
and yacc files. The former are /likely/ to be different across a
variety of build environments (though it's true that a single configure
script can also likely succeed in being useful on most systems). In the
latter case it is almost always the case that generated files
are /identical/ in form and definitely in function across all systems.

Actually, I'd say it's the other way around.

First, the configure script is supposed to be identical on all
systems. The only people who are supposed to generate configure from
configure.in are developers who modify configure.in. Everyone else
should use the configure script from CVS or the source tarball; it is
supposed to work on all systems.

Second, there is only one source for autoconf: GNU (although there are
multiple versions, it's not uncommon for projects to stipulate a
specific version). Also, the m4 version which is used shouldn't be
relevant (and non-GNU m4 is unlikely to work anyhow).

OTOH, lex/yacc implementations other than GNU flex/bison are widely
used, and they will typically generate different code (even if it's
semantically equivalent, that won't stop "cvs commit" from committing
the "new" version).

Furthermore, the generated code should depend upon any settings of
LFLAGS and YFLAGS in the environment when configure is run (in
reality, we aren't propogating these settings into the Makefiles;
that's a bug). Using lex flags such as -C will substantially alter the
actual code which is generated.

--
Glynn Clements <glynn@gclements.plus.com>

Daniel Calvelo Aros wrote:

> Can you try the patch I have sent in response to your previous
> message? It explicitly refactors the grammar to use multiple "levels"
> of expression rather than relying upon yacc to handle of precedence
> and fixity.

It compiles and works under MacOSX too. To test, I use the
lib/db/sqlp/test/sqlptest command.

I like your fixing the grammar better. However, it won't deal with AND/OR
precedence. I guess the same kind of grammar fixing for y_comparison is
required, isn't it?

Yes.

I don't know if it handles "NOT x OR y" correctly either (it depends
upon how yacc interprets multiple matching rules; I prefer to write
the grammar so that such situations never arise).

Also, there was a bug in the definition of y_product; it used
y_expression where it should have used y_product.

The attached patch (which supersedes the previous version) should
handle boolean operations correctly, and fix the bug.

--
Glynn Clements <glynn@gclements.plus.com>

(attachments)

sqlp.diff (3.23 KB)

On Mon, 13 Jun 2005, Glynn Clements wrote:

The attached patch (which supersedes the previous version) should
handle boolean operations correctly, and fix the bug.

With that latest version of the patch plus the comments surrounded by %{ / %}, there is now only one error with Solaris lex: the undefined macro YY_CURRENT_BUFFER. I'd guess this could be worked around fairly easily?
Full output below.
Paul

/usr/ccs/bin/lex lex.l
482/1200 nodes(%e), 1047/2500 positions(%p), 144/500 (%n), 8347 transitions,
448/10000 packed char classes(%k), 1318/2000 packed transitions(%a), 1655/12000 output slots(%o)
gcc -I/home/pkelly/grass/grass6/include -I/home/pkelly/grass/grass6/dist.sparc-sun-solaris2.8/include/grass -O2 -Wall -Wconversion -Wno-implicit-int -fPIC -DPACKAGE=\""grasslibs"\" -DPACKAGE=\""grasslibs"\" -I/home/pkelly/grass/grass6/include -I/home/pkelly/grass/grass6/dist.sparc-sun-solaris2.8/include/grass \
         -o OBJ.sparc-sun-solaris2.8/lex.yy.o -c lex.yy.c
lex.l:40: warning: return type defaults to `int'
lex.l: In function `yylex':
lex.l:41: warning: unused variable `yyprevious'
lex.l:54: warning: label `yyfussy' defined but not used
lex.l: In function `my_yyinput':
lex.l:226: warning: passing arg 3 of `memcpy' as unsigned due to prototype
lex.l: In function `yyerror':
lex.l:244: warning: implicit declaration of function `yy_flush_buffer'
lex.l:244: error: `YY_CURRENT_BUFFER' undeclared (first use in this function)
lex.l:244: error: (Each undeclared identifier is reported only once
lex.l:244: error: for each function it appears in.)
lex.l: At top level:
lex.l:698: warning: missing braces around initializer
lex.l:698: warning: (near initialization for `yycrank[0]')
lex.l:1114: warning: missing braces around initializer
lex.l:1114: warning: (near initialization for `yysvec[0]')
lex.l:1312: warning: ignoring #pragma ident
make: *** [OBJ.sparc-sun-solaris2.8/lex.yy.o] Error 1

Paul Kelly wrote:

When we go to compile lex.yy.c however there are further errors:

lex.l:256: error: `YY_CURRENT_BUFFER' undeclared (first use in this function)
lex.l:256: error: (Each undeclared identifier is reported only once
lex.l:256: error: for each function it appears in.)

I suspect that needs to be conditionalised, e.g.

#ifdef YY_CURRENT_BUFFER
  yy_flush_buffer(YY_CURRENT_BUFFER);
#endif

The rest are just warnings.

--
Glynn Clements <glynn@gclements.plus.com>

On Tue, 14 Jun 2005, Glynn Clements wrote:

I suspect that needs to be conditionalised, e.g.

#ifdef YY_CURRENT_BUFFER
  yy_flush_buffer(YY_CURRENT_BUFFER);
#endif

That makes it compile (obviously)---I've committed this and the surrounding of the comments by %{ / %} to CVS and it now compiles on Solaris. Ready to accept the patch for correct arithmetic precedence if it is tested enough.

Paul

From: Paul Kelly <paul-grass@stjohnspoint.co.uk>
Sent: Tue, 14 Jun 2005 09:22:08 +0100 (BST)
[snip]

Ready to accept the patch for correct arithmetic precedence
if it is tested enough.

Seems to work:

Input row: -->>select a from b where not c+d/(e+f/2)<2 and b>1 or d<2;
<<--
Input statement: -->>select a from b where not c+d/(e+f/2)<2 and b>1 or d<2<<--
********** SQL PARSER RESULT **********
INPUT: select a from b where not c+d/(e+f/2)<2 and b>1 or d<2
COMMAND: SELECT
TABLE: b
COLUMN 1: a
WHERE:
op: OR
  op: AND
    op: NOT
      op: <
        op: +
          col: c
          op: /
            col: d
            op: +
              col: e
              op: /
                col: f
                val: 2
        val: 2
    op: >
      col: b
      val: 1
  op: <
    col: d
    val: 2
ORDER BY: (null)
***************************************

NOT also seems to be handled properly as highest precedence. I think the only
tweaking left is unary minus.

Daniel.

Daniel Calvelo Aros wrote:

> Ready to accept the patch for correct arithmetic precedence
> if it is tested enough.

Seems to work:

NOT also seems to be handled properly as highest precedence. I think the only
tweaking left is unary minus.

The SQL interpreter in the DBF driver doesn't have a negation
operator.

I thought about parsing -E as a subtraction (0 - E), although I'm not
sure whether there's something I'm overlooking.

--
Glynn Clements <glynn@gclements.plus.com>