[Geonetwork-devel] RE: [GeoNetwork-*] Search problem when having "-" character in query

Hi list, regarding the bug on having "-" character in query, one solution could be the use of fuzzyQuery in Lucene. fuzzyQuery is less strict than termQuery.

I made some tests adding 2 parameters on the interface :
- fuzzy (on/off)
- similarity : float default 0.8

When querying using demo data :
- "Hydrological" + fuzzy off return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hydrological" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy off return 0 result
- "Hidrological" or "Hidrologicàl" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy on + similarity = 0.2 return 2 results "Hydrological basins in Africa (SAMPLE DATA!)" +
Forests and Drylands Programme: Forests Homepage (SAMPLE DATA) ... I don't know why but this is "fuzzy"

FuzzyQuery could be relevant when having special character "éàèôï..." and could be easier than searching in java for special character and puting ? to the TermQuery to find something.

Any comments on that point ?

Francois.

PS : Changed made for testing :
_________________________________________________________________
Add 2 form elements to the main page / Main-page.xsl :
  Fuzzy : <input type="checkbox" class="content" name="fuzzy"/><br/>
  Similarity : <input class="content" name="similarity" size="2" value=".8"/><br/>
    
_________________________________________________________________
Add a FuzzyQuery type to the Lucene.xsl and use it when fuzzy is on :
<xsl:variable name="fuzzy" select="string(/request/fuzzy)"/>
<xsl:variable name="similarity" select="/request/similarity"/>

    <!-- simple string -->
    <xsl:otherwise>
      <xsl:choose>
        <xsl:when test="$fuzzy='on'">
          <FuzzyQuery fld="{$field}" txt="{$expr/@text}" sim="{$similarity}"/>
        </xsl:when>
        <xsl:otherwise>
          <TermQuery fld="{$field}" txt="{$expr/@text}"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>

_________________________________________________________________
Just do the FuzzyQuery in Java / LuceneSearcher.java line 199 :
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }

-----Message d'origine-----
De : geonetwork-users-admin@lists.sourceforge.net [mailto:geonetwork-users-admin@lists.sourceforge.net] De la part de Jeroen Ticheler
Envoyé : vendredi 17 février 2006 15:03
À : Giaccio Roberto; François Prunayre
Cc : geonetwork-users@lists.sourceforge.net
Objet : Re: [GeoNetwork-users] Search problem when having "-" character in query

I filed a bug report for this.
Jeroen

On 1 Feb 2006, at 12:41, Roberto Giaccio wrote:

Ciao Francois,
I think that the string containing "-" is split into works by Lucene
when the metadata is indexed, but not when it is used as a search
term.
I have to check and see how to solve this.

                Roberto

On 31 Jan 2006, at 15:16, François Prunayre wrote:

Hi list, I noticed one problem when having "-" character in query

Searching for Eure loir get 70 results
http://sandre.eaufrance.fr/geonetwork/srv/fr/main.search?
extended=off&remote=off&attrset=geo&any=Eure+loir&hitsPerPage=10

Searching for Eure-et-Loir get 0 results
http://sandre.eaufrance.fr/geonetwork/srv/fr/main.search?
extended=off&remote=off&attrset=geo&any=Eure-et-Loir&hitsPerPage=10

Searching for "Eure-et-Loir" get 0 results
http://sandre.eaufrance.fr/geonetwork/srv/fr/main.search?
extended=off&remote=off&attrset=geo&any=%22Eure-et-Loir%
22&hitsPerPage=10

Any ideas one what's wrong ?

Thanks for your help. Francois

--
Ce message a ete verifie par MailScanner pour des virus ou des
polluriels et rien de suspect n'a ete trouve.

Les donnees et renseignements contenus dans ce message sont
personnels, confidentiels et privés.Toute publication, utilisation ou
diffusion, meme partielle, doit etre autorisee.

Any data and information contained in this electronic mail is
personal, confidential and private. Any total or partial publication,
use or distribution must be authorized.

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through
log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD
SPLUNK!
http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/
projects/geonetwork

áŠÄ…ë^™¨¥ŠË)¢{(­ç[É*eºyÀèÊ‹ ­êm†º.‚hø¥zÇ讚ènW¦±+h¤:0žZvØ^ì $ìyªÜ…éàŠw­…«fjG¬±æ«r§ƒ*.®Z ~)^±«jÌš²Ë«~)à¶°y°ÎXÒÎ 4-CJ†Ûiÿû•«.±ö¥‘¨'zßìzW&vYä’'uÓ~7Ù¸Û}8ó§Z·]µë†zƒ^·+’ë®ÉšŠX§‚X¬´g¨5ëp¢¹.±êì–+-²Ê.­ÇŸ¢¸ëa¶Úlÿùb²Û,¢êÜyú+éÞ·ùb²Û?–+-Šwèþ ¨ëp¢¹.±êìê
zÜ(®C©zt¨º·ŠÉšŠ{ZŠwjØm¶Ÿÿ²‹«qçè® §zß鮈ÞrÛ?ê'zÜ(®-- Ce message a ete verifie par MailScanner pour des virus ou des polluriels et rien de suspect n'a ete trouve.
Les donnees et renseignements contenus dans ce message sont personnels, confidentiels et prives. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee.
Any data and information contained in this electronic mail is personal, confidential and secret. Any total or partial publication, use or distribution must be authorized.

--
Ce message a ete verifie par MailScanner pour des virus ou des polluriels et rien de suspect n'a ete trouve.

Les donnees et renseignements contenus dans ce message sont personnels, confidentiels et prives. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee.

Any data and information contained in this electronic mail is personal, confidential and secret. Any total or partial publication, use or distribution must be authorized.

Hi François,

I think that doing a search using fuzzy queries is the "right thing to do".
I would leave fuzzy=on and similarity=0.8 just to keep things simple.
Other comments are well appreciated.

Cheers,
Andrea

Hi list, regarding the bug on having "-" character in query, one solution could be the use of fuzzyQuery in Lucene. fuzzyQuery is less strict than termQuery.

I made some tests adding 2 parameters on the interface :
- fuzzy (on/off)
- similarity : float default 0.8

When querying using demo data :
- "Hydrological" + fuzzy off return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hydrological" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy off return 0 result
- "Hidrological" or "Hidrologicàl" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy on + similarity = 0.2 return 2 results "Hydrological basins in Africa (SAMPLE DATA!)" +
Forests and Drylands Programme: Forests Homepage (SAMPLE DATA) ... I don't know why but this is "fuzzy"

FuzzyQuery could be relevant when having special character "éàèôï..." and could be easier than searching in java for special character and puting ? to the TermQuery to find something.

Any comments on that point ?

Francois.

PS : Changed made for testing :
_________________________________________________________________
Add 2 form elements to the main page / Main-page.xsl :
  Fuzzy : <input type="checkbox" class="content" name="fuzzy"/><br/>
  Similarity : <input class="content" name="similarity" size="2" value=".8"/><br/>
    
_________________________________________________________________
Add a FuzzyQuery type to the Lucene.xsl and use it when fuzzy is on :
<xsl:variable name="fuzzy" select="string(/request/fuzzy)"/>
<xsl:variable name="similarity" select="/request/similarity"/>

    <!-- simple string -->
    <xsl:otherwise>
      <xsl:choose>
        <xsl:when test="$fuzzy='on'">
          <FuzzyQuery fld="{$field}" txt="{$expr/@text}" sim="{$similarity}"/>
        </xsl:when>
        <xsl:otherwise>
          <TermQuery fld="{$field}" txt="{$expr/@text}"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>

_________________________________________________________________
Just do the FuzzyQuery in Java / LuceneSearcher.java line 199 :
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }

Just wondering: does this influence the geographic search algorithm as currently implemented? I'm copying Roberto because he implemented that part :slight_smile:
Jeroen

On May 16, 2006, at 4:43 PM, Andrea Carboni wrote:

Hi François,

I think that doing a search using fuzzy queries is the "right thing to do".
I would leave fuzzy=on and similarity=0.8 just to keep things simple.
Other comments are well appreciated.

Cheers,
Andrea

Hi list, regarding the bug on having "-" character in query, one solution could be the use of fuzzyQuery in Lucene. fuzzyQuery is less strict than termQuery.

I made some tests adding 2 parameters on the interface :
- fuzzy (on/off)
- similarity : float default 0.8

When querying using demo data :
- "Hydrological" + fuzzy off return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hydrological" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy off return 0 result
- "Hidrological" or "Hidrologicàl" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy on + similarity = 0.2 return 2 results "Hydrological basins in Africa (SAMPLE DATA!)" +
Forests and Drylands Programme: Forests Homepage (SAMPLE DATA) ... I don't know why but this is "fuzzy"

FuzzyQuery could be relevant when having special character "éàèôï..." and could be easier than searching in java for special character and puting ? to the TermQuery to find something.

Any comments on that point ?

Francois.

PS : Changed made for testing :
_________________________________________________________________
Add 2 form elements to the main page / Main-page.xsl :
  Fuzzy : <input type="checkbox" class="content" name="fuzzy"/><br/>
  Similarity : <input class="content" name="similarity" size="2" value=".8"/><br/>
    
_________________________________________________________________
Add a FuzzyQuery type to the Lucene.xsl and use it when fuzzy is on :
<xsl:variable name="fuzzy" select="string(/request/fuzzy)"/>
<xsl:variable name="similarity" select="/request/similarity"/>

    <!-- simple string -->
    <xsl:otherwise>
      <xsl:choose>
        <xsl:when test="$fuzzy='on'">
          <FuzzyQuery fld="{$field}" txt="{$expr/@text}" sim="{$similarity}"/>
        </xsl:when>
        <xsl:otherwise>
          <TermQuery fld="{$field}" txt="{$expr/@text}"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>

_________________________________________________________________
Just do the FuzzyQuery in Java / LuceneSearcher.java line 199 :
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }

-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642
_______________________________________________
Geonetwork-devel mailing list
Geonetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Clearly, fuzzy search should be limited to the title, abstract, free text search and
keywords (maybe something else).

Andrea

Just wondering: does this influence the geographic search algorithm
as currently implemented? I'm copying Roberto because he implemented
that part :slight_smile:
Jeroen

On May 16, 2006, at 4:43 PM, Andrea Carboni wrote:

> Hi François,
>
> I think that doing a search using fuzzy queries is the "right thing
> to do".
> I would leave fuzzy=on and similarity=0.8 just to keep things simple.
> Other comments are well appreciated.
>
> Cheers,
> Andrea
>
>
>> Hi list, regarding the bug on having "-" character in query, one
>> solution could be the use of fuzzyQuery in Lucene. fuzzyQuery is
>> less strict than termQuery.
>>
>> I made some tests adding 2 parameters on the interface :
>> - fuzzy (on/off)
>> - similarity : float default 0.8
>>
>> When querying using demo data :
>> - "Hydrological" + fuzzy off return 1 result "Hydrological basins
>> in Africa (SAMPLE DATA!)"
>> - "Hydrological" + fuzzy on return 1 result "Hydrological basins
>> in Africa (SAMPLE DATA!)"
>> - "Hidrological" + fuzzy off return 0 result
>> - "Hidrological" or "Hidrologicàl" + fuzzy on return 1 result
>> "Hydrological basins in Africa (SAMPLE DATA!)"
>> - "Hidrological" + fuzzy on + similarity = 0.2 return 2 results
>> "Hydrological basins in Africa (SAMPLE DATA!)" +
>> Forests and Drylands Programme: Forests Homepage (SAMPLE DATA) ...
>> I don't know why but this is "fuzzy"
>>
>> FuzzyQuery could be relevant when having special character
>> "éàèôï..." and could be easier than searching in java for special
>> character and puting ? to the TermQuery to find something.
>>
>> Any comments on that point ?
>>
>> Francois.
>>
>> PS : Changed made for testing :
>> _________________________________________________________________
>> Add 2 form elements to the main page / Main-page.xsl :
>> Fuzzy : <input type="checkbox" class="content" name="fuzzy"/><br/>
>> Similarity : <input class="content" name="similarity" size="2"
>> value=".8"/><br/>
>>
>>
>> _________________________________________________________________
>> Add a FuzzyQuery type to the Lucene.xsl and use it when fuzzy is on :
>> <xsl:variable name="fuzzy" select="string(/request/fuzzy)"/>
>> <xsl:variable name="similarity" select="/request/similarity"/>
>>
>> <!-- simple string -->
>> <xsl:otherwise>
>> <xsl:choose>
>> <xsl:when test="$fuzzy='on'">
>> <FuzzyQuery fld="{$field}" txt="{$expr/@text}"
>> sim="{$similarity}"/>
>> </xsl:when>
>> <xsl:otherwise>
>> <TermQuery fld="{$field}" txt="{$expr/@text}"/>
>> </xsl:otherwise>
>> </xsl:choose>
>> </xsl:otherwise>
>>
>> _________________________________________________________________
>> Just do the FuzzyQuery in Java / LuceneSearcher.java line 199 :
>> else if (name.equals("FuzzyQuery"))
>> {
>> String fld = xmlQuery.getAttributeValue("fld");
>> Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
>> String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
>> return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
>> }
>>
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services,
> security?
> Get stuff done quickly with pre-integrated technology to make your
> job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642
> _______________________________________________
> Geonetwork-devel mailing list
> Geonetwork-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
> GeoNetwork OpenSource is maintained at http://sourceforge.net/
> projects/geonetwork

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÔ²)àN‰œjÖî¶wžvÚ¢j+{ó^yÛh²êi¢»py»®øœzÏìyË«ŠÜÿël¶çßv‰Þªèœ’\°ŠØi­ïâž× ­«^vל†z%¢ ­¢f¤{*.®:ey«"z°èÂyhiÒ1g›J˜^­à)¦Xœjب'«½êïÿ_ôÿVÚ±çhœ
Zr†zº'Šj!¶Úÿÿû—ö¬þëÿ}©djçzßìz_ܙžOä‰ÛNô÷öâßN{ýÖ­ëÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿñž¢w­ÂŠäýׯzYšŠX§‚X¬´g¨ëp¢¹ÿuëޗùb²Ûÿ²‹«qçè®aÿëa¶ÚlÿÿåŠËlþÊ.­ÇŸ¢¸þw­þX¬¶ÏåŠËbú?ê'zÜ(®OÝz÷¥ê
zÜ(®C©zt¨º·ŠÉšŠ{ZŠwjØm¶ŸÿþÊ.­ÇŸ¢¸þw­þšèç-³ø¢w­ÂŠä

I was thinking about this configuration. One thing that we could think of in the future is to have the free text search use an advanced option where a user can change this "similarity" setting on a slider kind of graphic. This would allow a user to make a search more precise or more flexible. Could be an interesting (advanced!) option!?
Jeroen

On May 16, 2006, at 3:37 PM, François Prunayre wrote:

Hi list, regarding the bug on having "-" character in query, one solution could be the use of fuzzyQuery in Lucene. fuzzyQuery is less strict than termQuery.

I made some tests adding 2 parameters on the interface :
- fuzzy (on/off)
- similarity : float default 0.8

When querying using demo data :
- "Hydrological" + fuzzy off return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hydrological" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy off return 0 result
- "Hidrological" or "Hidrologicàl" + fuzzy on return 1 result "Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy on + similarity = 0.2 return 2 results "Hydrological basins in Africa (SAMPLE DATA!)" +
Forests and Drylands Programme: Forests Homepage (SAMPLE DATA) ... I don't know why but this is "fuzzy"

FuzzyQuery could be relevant when having special character "éàèôï..." and could be easier than searching in java for special character and puting ? to the TermQuery to find something.

Any comments on that point ?

Francois.

PS : Changed made for testing :
_________________________________________________________________
Add 2 form elements to the main page / Main-page.xsl :
  Fuzzy : <input type="checkbox" class="content" name="fuzzy"/><br/>
  Similarity : <input class="content" name="similarity" size="2" value=".8"/><br/>
    
_________________________________________________________________
Add a FuzzyQuery type to the Lucene.xsl and use it when fuzzy is on :
<xsl:variable name="fuzzy" select="string(/request/fuzzy)"/>
<xsl:variable name="similarity" select="/request/similarity"/>

    <!-- simple string -->
    <xsl:otherwise>
      <xsl:choose>
        <xsl:when test="$fuzzy='on'">
          <FuzzyQuery fld="{$field}" txt="{$expr/@text}" sim="{$similarity}"/>
        </xsl:when>
        <xsl:otherwise>
          <TermQuery fld="{$field}" txt="{$expr/@text}"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>

_________________________________________________________________
Just do the FuzzyQuery in Java / LuceneSearcher.java line 199 :
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }

-----Message d'origine-----
De : geonetwork-users-admin@lists.sourceforge.net [mailto:geonetwork-users-admin@lists.sourceforge.net] De la part de Jeroen Ticheler
Envoyé : vendredi 17 février 2006 15:03
À : Giaccio Roberto; François Prunayre
Cc : geonetwork-users@lists.sourceforge.net
Objet : Re: [GeoNetwork-users] Search problem when having "-" character in query

I filed a bug report for this.
Jeroen

On 1 Feb 2006, at 12:41, Roberto Giaccio wrote:

Ciao Francois,
I think that the string containing "-" is split into works by Lucene
when the metadata is indexed, but not when it is used as a search
term.
I have to check and see how to solve this.

                Roberto

On 31 Jan 2006, at 15:16, François Prunayre wrote:

Hi list, I noticed one problem when having "-" character in query

Searching for Eure loir get 70 results
http://sandre.eaufrance.fr/geonetwork/srv/fr/main.search?
extended=off&remote=off&attrset=geo&any=Eure+loir&hitsPerPage=10

Searching for Eure-et-Loir get 0 results
http://sandre.eaufrance.fr/geonetwork/srv/fr/main.search?
extended=off&remote=off&attrset=geo&any=Eure-et-Loir&hitsPerPage=10

Searching for "Eure-et-Loir" get 0 results
http://sandre.eaufrance.fr/geonetwork/srv/fr/main.search?
extended=off&remote=off&attrset=geo&any=%22Eure-et-Loir%
22&hitsPerPage=10

Any ideas one what's wrong ?

Thanks for your help. Francois

--
Ce message a ete verifie par MailScanner pour des virus ou des
polluriels et rien de suspect n'a ete trouve.

Les donnees et renseignements contenus dans ce message sont
personnels, confidentiels et privés.Toute publication, utilisation ou
diffusion, meme partielle, doit etre autorisee.

Any data and information contained in this electronic mail is
personal, confidential and private. Any total or partial publication,
use or distribution must be authorized.

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through
log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD
SPLUNK!
http://sel.as-us.falkag.net/sel?cmdlnk&kid3432&bid#0486&dat1642
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/
projects/geonetwork

áŠÄ…ë^™¨¥ŠË)¢{( ç[É*eºyÀèÊ‹ êm†º.‚hø¥zÇ讚ènW¦±+h¤:0žZvØ^ì $ìyªÜ…éàŠw …«fjG¬±æ«r§ƒ*.®Z ~)^±«jÌš²Ë«~)à¶°y°ÎXÒÎ 4-CJ†Ûiÿû•«.±ö¥‘¨'zßìzW&vYä’'uÓ~7Ù¸Û}8ó§Z·]µë†zƒ^·+’ë®ÉšŠX§‚X¬´g¨5ëp¢¹.±êì–+-²Ê. ÇŸ¢¸ëa¶Úlÿùb²Û,¢êÜyú+éÞ·ùb²Û?–+-Šwèþ ¨ëp¢¹.±êìê
zÜ(®C©zt¨º·ŠÉšŠ{ZŠwjØm¶Ÿÿ²‹«qçè® §zß鮈ÞrÛ?ê'zÜ(®-- Ce message a ete verifie par MailScanner pour des virus ou des polluriels et rien de suspect n'a ete trouve.
Les donnees et renseignements contenus dans ce message sont personnels, confidentiels et prives. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee.
Any data and information contained in this electronic mail is personal, confidential and secret. Any total or partial publication, use or distribution must be authorized.

--
Ce message a ete verifie par MailScanner pour des virus ou des polluriels et rien de suspect n'a ete trouve.

Les donnees et renseignements contenus dans ce message sont personnels, confidentiels et prives. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee.

Any data and information contained in this electronic mail is personal, confidential and secret. Any total or partial publication, use or distribution must be authorized.

-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642
_______________________________________________
Geonetwork-devel mailing list
Geonetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

I would have to check this, but it could be that the problem with the "-" sign is due to the fact that the metadata fields and the search field are tokenized differently.
On the fuzzy search, I wonder if there are performance issues to be considered; in general any partial match search that does not work on a prefix tends to give some performance penalty, both in terms of space (in this case, number of generated clauses?) and query resolution time.
It would be worth doing some tests before adding fuzzy search.

        Roberto

On 16 May 2006, at 17:33, Andrea Carboni wrote:

Clearly, fuzzy search should be limited to the title, abstract, free text search and
keywords (maybe something else).

Andrea

Just wondering: does this influence the geographic search algorithm
as currently implemented? I'm copying Roberto because he implemented
that part :slight_smile:
Jeroen

On May 16, 2006, at 4:43 PM, Andrea Carboni wrote:

Hi François,

I think that doing a search using fuzzy queries is the "right thing
to do".
I would leave fuzzy=on and similarity=0.8 just to keep things simple.
Other comments are well appreciated.

Cheers,
Andrea

Hi list, regarding the bug on having "-" character in query, one
solution could be the use of fuzzyQuery in Lucene. fuzzyQuery is
less strict than termQuery.

I made some tests adding 2 parameters on the interface :
- fuzzy (on/off)
- similarity : float default 0.8

When querying using demo data :
- "Hydrological" + fuzzy off return 1 result "Hydrological basins
in Africa (SAMPLE DATA!)"
- "Hydrological" + fuzzy on return 1 result "Hydrological basins
in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy off return 0 result
- "Hidrological" or "Hidrologicàl" + fuzzy on return 1 result
"Hydrological basins in Africa (SAMPLE DATA!)"
- "Hidrological" + fuzzy on + similarity = 0.2 return 2 results
"Hydrological basins in Africa (SAMPLE DATA!)" +
Forests and Drylands Programme: Forests Homepage (SAMPLE DATA) ...
I don't know why but this is "fuzzy"

FuzzyQuery could be relevant when having special character
"éàèôï..." and could be easier than searching in java for special
character and puting ? to the TermQuery to find something.

Any comments on that point ?

Francois.

PS : Changed made for testing :
_________________________________________________________________
Add 2 form elements to the main page / Main-page.xsl :
  Fuzzy : <input type="checkbox" class="content" name="fuzzy"/><br/>
  Similarity : <input class="content" name="similarity" size="2"
value=".8"/><br/>
    
_________________________________________________________________
Add a FuzzyQuery type to the Lucene.xsl and use it when fuzzy is on :
<xsl:variable name="fuzzy" select="string(/request/fuzzy)"/>
<xsl:variable name="similarity" select="/request/similarity"/>

    <!-- simple string -->
    <xsl:otherwise>
      <xsl:choose>
        <xsl:when test="$fuzzy='on'">
          <FuzzyQuery fld="{$field}" txt="{$expr/@text}"
sim="{$similarity}"/>
        </xsl:when>
        <xsl:otherwise>
          <TermQuery fld="{$field}" txt="{$expr/@text}"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>

_________________________________________________________________
Just do the FuzzyQuery in Java / LuceneSearcher.java line 199 :
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }

-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your
job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642
_______________________________________________
Geonetwork-devel mailing list
Geonetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/
projects/geonetwork

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÔ²)àN‰œjÖî¶wžvÚ¢j+{ó^yÛh²êi¢»py»®øœzÏìyË«ŠÜÿël¶çßv‰Þªèœ’\°ŠØi­ïâž× ­«^vל†z%¢ ­¢f¤{*.®:ey«"z°èÂyhiÒ1g›J˜^­à)¦Xœjب'«½êïÿ_ôÿVÚ±çhœ
Zr†zº'Šj!¶Úÿÿû—ö¬þëÿ}©djçzßìz_ܙžOä‰ÛNô÷öâßN{ýÖ­ëÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿñž¢w­ÂŠäýׯzYšŠX§‚X¬´g¨ëp¢¹ÿuëޗùb²Ûÿ²‹«qçè®aÿëa¶ÚlÿÿåŠËlþÊ.­ÇŸ¢¸þw­þX¬¶ÏåŠËbú?ê'zÜ(®OÝz÷¥ê

zÜ(®C©zt¨º·ŠÉšŠ{ZŠwjØm¶ŸÿþÊ.­ÇŸ¢¸þw­þšèç-³ø¢w­ÂŠä