[GeoNetwork-devel] Differences between xml.search service and Lucene index query ?

Hi,

I have done the following http request :
http://127.0.0.1:8080/geonetwork/srv/en/xml.search?title=myValue

And then, I have done the following query in Lucene index using Luke :
“title:myValue”

I thounght xml.search service used lucene index to perform requests.
But, GeoNetwork (xml.search service) returns more records than Luke (query in lucene index).
Do you know why ?

Thx

hi Rudy,

GeoNetwork searches (either from the GUI, or through interfaces such as CSW GetRecords or xml.search) certainly do use the Lucene index. However to compare 1 to 1 with searches done in Luke, you should log the actual Lucene query created by GeoNetwork. Depending on which version you’re using, GN does certain operations to the query such as adding an amount of fuzziness, disregarding ‘stop words’, or abstracting over accented characters.

Unfortunately not yet all the ways of searching using GeoNetwork are creating the Lucene query in the same way, but at least for GUI searches (I’m not sure off the top of my head if this also goes for xml.search), the actual Lucene queries are printed to the console if you set logging levels to DEBUG (search for “Lucene query:”).

Even then a real 1-to-1 comparison with Luke remains tricky, as you’d need to make sure that Luke uses the same Analyzer on the query terms as does GeoNetwork, (and also make sure it uses exactly the same version of Lucene). In recent 2.6.x versions GeoNetwork uses GeoNetworkAnalyzer, which is not available by default in Luke; if you want I can provide you with a custom-built version of Luke that includes this Analyzer (though it’s not created with the latest version of Luke).

kind regards
Heikki Doeleman

On Thu, May 12, 2011 at 4:17 PM, Rudy Commenge <rudywi.devel@anonymised.com> wrote:

Hi,

I have done the following http request :
http://127.0.0.1:8080/geonetwork/srv/en/xml.search?title=myValue

And then, I have done the following query in Lucene index using Luke :
“title:myValue”

I thounght xml.search service used lucene index to perform requests.
But, GeoNetwork (xml.search service) returns more records than Luke (query in lucene index).
Do you know why ?

Thx


Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

I’m under GeoNetwork 2.6.2.
I’m interested about your GeoNetwork Analyser for Luke.

Best regards

2011/5/12 heikki <tropicano@anonymised.com>

hi Rudy,

GeoNetwork searches (either from the GUI, or through interfaces such as CSW GetRecords or xml.search) certainly do use the Lucene index. However to compare 1 to 1 with searches done in Luke, you should log the actual Lucene query created by GeoNetwork. Depending on which version you’re using, GN does certain operations to the query such as adding an amount of fuzziness, disregarding ‘stop words’, or abstracting over accented characters.

Unfortunately not yet all the ways of searching using GeoNetwork are creating the Lucene query in the same way, but at least for GUI searches (I’m not sure off the top of my head if this also goes for xml.search), the actual Lucene queries are printed to the console if you set logging levels to DEBUG (search for “Lucene query:”).

Even then a real 1-to-1 comparison with Luke remains tricky, as you’d need to make sure that Luke uses the same Analyzer on the query terms as does GeoNetwork, (and also make sure it uses exactly the same version of Lucene). In recent 2.6.x versions GeoNetwork uses GeoNetworkAnalyzer, which is not available by default in Luke; if you want I can provide you with a custom-built version of Luke that includes this Analyzer (though it’s not created with the latest version of Luke).

kind regards
Heikki Doeleman

On Thu, May 12, 2011 at 4:17 PM, Rudy Commenge <rudywi.devel@anonymised.com> wrote:

Hi,

I have done the following http request :
http://127.0.0.1:8080/geonetwork/srv/en/xml.search?title=myValue

And then, I have done the following query in Lucene index using Luke :
“title:myValue”

I thounght xml.search service used lucene index to perform requests.
But, GeoNetwork (xml.search service) returns more records than Luke (query in lucene index).
Do you know why ?

Thx


Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork