[GeoNetwork-devel] Search for strings containing space with Lucene

Hi

Firstly, congratulations for the beta one! it looks cool and I'm pretty hurry to go on!

However, still with my 2.0.3, i'm facing troubles when performing a search in Lucene, for a string containing space or special characters (eg é or è... à).

In fact my category name is /Milieu physique/Limites physiographiques/Continent and when I perform a query on it, lucene separates my term in several, and then the query returns no result.

exemple of XML query :
<BooleanClause required="true" prohibited="false">
    <BooleanQuery>
      <BooleanClause required="true" prohibited="false">
        <TermQuery fld="_cat" txt="/Milieu" />
      </BooleanClause>
      <BooleanClause required="true" prohibited="false">
        <TermQuery fld="_cat" txt="physique/Limites" />
      </BooleanClause>
      <BooleanClause required="true" prohibited="false">
        <TermQuery fld="_cat" txt="physiographiques/Continent" />
      </BooleanClause>
    </BooleanQuery>
  </BooleanClause>

Is there a way to configure it in luceneSearcher.java or do I need to encapsulate my category with quotes or something else?

I tried to put some single quotes or double quotes.... but nothing better, lucene is not performing the same query... as you can see on the following exemple :

<BooleanClause required="true" prohibited="false">
    <PhraseQuery>
      <Term fld="_cat" txt="/Milieu" />
      <Term fld="_cat" txt="physique/Limites" />
      <Term fld="_cat" txt="physiographiques/Continent" />
    </PhraseQuery>
  </BooleanClause>

Any idea about that?
Any help will be greatly appreciated!
Thanks,

Regards,

--

Mathieu COUDERT

Hi Mathieu,

this seems a bug to me. I see that categories are indexed by name while
I think they should be indexed by their id. This is the reason why you have
a dropdown to choose a category instead a free text field. As a side effect
of this bug, the category names get tokenized giving wrong results.

You can easily fix the problem. In DataManager.java / index method you
have to change these lines:

    String categoryName = category.getChildText("name");
    moreFields.add(makeField("_cat", categoryName, true, true, false));

Into these ones:

    String categoryId = category.getChildText("id");
    moreFields.add(makeField("_cat", categoryId, true, true, false));

Than you have to edit the main-page.xsl and change the 'categories'
template to pass che category id instead of the name.

I have to discuss with other developers but the fix should be this.

Cheers,
Andrea

Hi

Firstly, congratulations for the beta one! it looks cool and I'm pretty
hurry to go on!

However, still with my 2.0.3, i'm facing troubles when performing a
search in Lucene, for a string containing space or special characters
(eg é or è... à).

In fact my category name is /Milieu physique/Limites
physiographiques/Continent and when I perform a query on it, lucene
separates my term in several, and then the query returns no result.

exemple of XML query :
<BooleanClause required="true" prohibited="false">
    <BooleanQuery>
      <BooleanClause required="true" prohibited="false">
        <TermQuery fld="_cat" txt="/Milieu" />
      </BooleanClause>
      <BooleanClause required="true" prohibited="false">
        <TermQuery fld="_cat" txt="physique/Limites" />
      </BooleanClause>
      <BooleanClause required="true" prohibited="false">
        <TermQuery fld="_cat" txt="physiographiques/Continent" />
      </BooleanClause>
    </BooleanQuery>
  </BooleanClause>

Is there a way to configure it in luceneSearcher.java or do I need to
encapsulate my category with quotes or something else?

I tried to put some single quotes or double quotes.... but nothing
better, lucene is not performing the same query... as you can see on the
following exemple :

<BooleanClause required="true" prohibited="false">
    <PhraseQuery>
      <Term fld="_cat" txt="/Milieu" />
      <Term fld="_cat" txt="physique/Limites" />
      <Term fld="_cat" txt="physiographiques/Continent" />
    </PhraseQuery>
  </BooleanClause>

Any idea about that?
Any help will be greatly appreciated!
Thanks,

Regards,