[GeoNetwork-devel] locale question

Hi,

I have a question about lucene indexing and document locales. I'm developing a search interface based on tabsearch UI which uses the 'q' fast index service. If I run a spatial-only search without the 'any' parameter, the documents returned from the index are missing some fields, eg abstract, keywords, responsibleParty etc. The same search with the 'any' param returns complete documents , ie they include all the lucene 'dump fields' as per the lucene config.

Presumably this has to do with the language detection based on the presence of the 'any' parameter? There appear to be two entries in the 'eng' index for each document (same _id and uuid), one with a _locale field and one without. The entry with the _locale field has all dump fields stored, the other one does not. I'm not sure why this is the case but I think it should be possible to do a spatial-only search and get back documents with all the dump fields.

I notice that if I provide a '_locale' parameter in the search request I can get full results so this may be a workaround? Just wondering if this is a bug or is expected behaviour? I'm running a clean install of the latest 2.8 RC2.

many thanks for any help,

Brian.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

I am surprised that the addition of _locale helps significantly. Could you give me several queries so that I can construct tests from them?

I have done some fairly significant work on one of the branches off of master that addresses issues. I am waiting for a review before merging the changes onto master.

The 2.8 and master versions currently split each language into separate lucene indices and search all of them. When a hit is found that lucene document is loaded. It might be the english document in which case it will only have the english translations. If the title is only in french then that data is not available and won’t be displayed.

On my branch, you can configure the dump fields as multilingual. In this case all translations are put in each document in such a way that the search weighting will not be affected, but the data is there for retrieval. It should probably be a configuration option but it is possible to get the field even if the field is in a different language.

The main reason I have added the translations is because I was seeing similar problems. For example:

  1. Sort by title, search by abstract would fail because the abstract hits might be french but only and english translation might be available.
  2. search by abstract would not have a title because the UI is not in the same language as the found document.

etc…

···

On Thu, Dec 13, 2012 at 2:44 PM, Brian O’Hare <brian.ohare@anonymised.com> wrote:

Hi,

I have a question about lucene indexing and document locales. I’m
developing a search interface based on tabsearch UI which uses the ‘q’
fast index service. If I run a spatial-only search without the ‘any’
parameter, the documents returned from the index are missing some
fields, eg abstract, keywords, responsibleParty etc. The same search
with the ‘any’ param returns complete documents , ie they include all
the lucene ‘dump fields’ as per the lucene config.

Presumably this has to do with the language detection based on the
presence of the ‘any’ parameter? There appear to be two entries in the
‘eng’ index for each document (same _id and uuid), one with a _locale
field and one without. The entry with the _locale field has all dump
fields stored, the other one does not. I’m not sure why this is the case
but I think it should be possible to do a spatial-only search and get
back documents with all the dump fields.

I notice that if I provide a ‘_locale’ parameter in the search request I
can get full results so this may be a workaround? Just wondering if this
is a bug or is expected behaviour? I’m running a clean install of the
latest 2.8 RC2.

many thanks for any help,

Brian.


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

See ticket #1181 which is closed with status fixed.

Cheers,
Simon
________________________________________
From: Brian O'Hare [brian.ohare@anonymised.com]
Sent: Friday, 14 December 2012 12:44 AM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] locale question

Hi,

I have a question about lucene indexing and document locales. I'm
developing a search interface based on tabsearch UI which uses the 'q'
fast index service. If I run a spatial-only search without the 'any'
parameter, the documents returned from the index are missing some
fields, eg abstract, keywords, responsibleParty etc. The same search
with the 'any' param returns complete documents , ie they include all
the lucene 'dump fields' as per the lucene config.

Presumably this has to do with the language detection based on the
presence of the 'any' parameter? There appear to be two entries in the
'eng' index for each document (same _id and uuid), one with a _locale
field and one without. The entry with the _locale field has all dump
fields stored, the other one does not. I'm not sure why this is the case
but I think it should be possible to do a spatial-only search and get
back documents with all the dump fields.

I notice that if I provide a '_locale' parameter in the search request I
can get full results so this may be a workaround? Just wondering if this
is a bug or is expected behaviour? I'm running a clean install of the
latest 2.8 RC2.

many thanks for any help,

Brian.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers

_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net

GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork