[GeoNetwork-devel] Quoted strings as search keys

The manual says:

"You can use quotes around text to find exact
combinations of words."

However, services.main.Search.exec()
passes the search keys along to
MainUtil.splitWord(), which in turn passes
them on to the "StandardAnalyzer" of Lucene,
which (among other things) throws away
the double quotes. :frowning:

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082

Software Improvements gn-devel wrote:

The manual says:

"You can use quotes around text to find exact
combinations of words."

However, services.main.Search.exec()
passes the search keys along to
MainUtil.splitWord(), which in turn passes
them on to the "StandardAnalyzer" of Lucene,
which (among other things) throws away
the double quotes. :frowning:

Yes - looks like the idea was to do stop word filtering because other filtering like to-lowercase etc is already done by the lucene search xslts and/or the Java. Simple fix would be to remove the splitWord call (cf. keyword search which doesn't use it and works fine) - 2.0.3 didn't use it at all as far as I can see - stop word filtering is probably only going to be useful when the text search string doesn't contain quoted strings?

This is one for the maintenance release?

Cheers,
Simon

What the manual is implicitly suggesting is that a Lucene QuerParser is used, which would support things like phrase query by quotes, and the words or and not as boolean operators, out of the Lucene box. This is however not the case, the Lucene query is build without involving a QueryParser.

I have added support for PHRASE, OR and WITHOUT queries along the same lines as the existing AND (by manually constructing the Lucene query). There are new html input fields on the advanced search section for each of these types of query, however they are hidden for now, in order not to mess up the graphic layout of the page. If you set their display property to block or inline you’ll see them and you can use them.

Removing the splitWord call would not be sufficient – if we want these “advanced” types of query inside a single html input field then we must use a Lucene QueryParser on the query string.

heikki doeleman

On Mon, Apr 7, 2008 at 1:42 PM, Simon Pigot <sppigot@anonymised.com> wrote:

Software Improvements gn-devel wrote:

The manual says:

“You can use quotes around text to find exact
combinations of words.”

However, services.main.Search.exec()
passes the search keys along to
MainUtil.splitWord(), which in turn passes
them on to the “StandardAnalyzer” of Lucene,
which (among other things) throws away
the double quotes. :frowning:

Yes - looks like the idea was to do stop word filtering because other
filtering like to-lowercase etc is already done by the lucene search
xslts and/or the Java. Simple fix would be to remove the splitWord call
(cf. keyword search which doesn’t use it and works fine) - 2.0.3 didn’t
use it at all as far as I can see - stop word filtering is probably only
going to be useful when the text search string doesn’t contain quoted
strings?

This is one for the maintenance release?

Cheers,
Simon


This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Register now and save $200. Hurry, offer ends at 11:59 p.m.,
Monday, April 7! Use priority code J8TLD2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

–
Inventions cannot, in nature, be a subject of property. – Thomas Jefferson

heikki wrote:

I have added support for PHRASE, OR and WITHOUT queries along the same lines as the existing AND (by manually constructing the Lucene query). There are new html input fields on the advanced search section for each of these types of query, however they are hidden for now, in order not to mess up the graphic layout of the page. If you set their display property to block or inline you'll see them and you can use them.

Found it . . . but as you say, your additions are
for local searching (i.e,. with Lucene),
and I am doing remote searches (i.e., with Z39.50).

But I can see from your changes what is still
missing.

Removing the splitWord call would not be sufficient -- if we want these "advanced" types of query inside a single html input field then we must use a Lucene QueryParser on the query string.

Maybe . . . but just remember that there are some of
us who want to use Z39.50 searching. So to
whomever "fixes" this for local searches: please also
fix it for remote searches.

--
Richard Walker
Software Improvements Pty Ltd
Phone: +61 2 6273 2055
Fax: +61 2 6273 2082