[GeoNetwork-users] Question about the thesaurus use.

Francois-Xavier_Prun · January 11, 2008, 8:29am

Hi Fabien,

You're option seems to be a good alternative. We discuss that also with
Jeroen, last month I guess, and the other option is to add the
broader/narrower/related term to the lucene index when indexing metadata
instead of adding them at search time.
It should be better because indexing metadata is happening only on
update, and search performance could be better that way. Problem could
be the size of lucene index ...

Maybe a third option is to learn to lucene where is the thesaurus and
maybe take that thesaurus into account for search ... but I'm not sure
there's such functionnality in Lucene ...

An other issue is multilingual thesaurus. Fabien you're working on the
INSPIRE context are you facing issue on that point also ? Which
thesaurus are you using ? GEMET ?
Other problem is storing keywords or keyword identifier....

Maybe it's time to open a discussion on the R&D trac on keyword and
thesaurus improvement and organize all inputs on that topic.

Francois.

FBachraty wrote:

Hello,

In order to let the search keyword working with thesaurus, I have update the
code in the LuceneSearcher.java file.

I am starting with java so my code can be optimised, but what i do work.

Now it's possible to find Broader, Narrowed and ,Related term automatically
on the Advanced Search.

What i do is simple :

For each keyword typed between '' '' (themekey / Lucene TermQuery ), i look
on all thesaurus available on GN node, and then I find corresponding
keywords identification.
Finally with the keyword identification I find the Broader, Related,
Narrowed words and i add them to the Lucene query.

I do not manage for the moment the ( themekey / Fuzzyquery ) but i think
that the same way can be used.

As example i have a thesaurus with Climate and Broader term like rainfall.
If i put ''Climate'' on the keyword search the request will be automatically
and internally transform as :
''Climate'' or ''Rainfall'' or ''Frost'' or ''...''.

the “climate” search create the lucene query above :

+((keyword:climate keyword:frost keyword:radiation keyword:rainfall
keyword:temperature)) +eastBL:[181 TO 540] +westBL:[180 TO 539]
+northBL:[271 TO 450] +southBL:[270 TO 449] +((_op0:3) (_op0:2) (_op0:0)
(_op0:4) (_op0:1) (_owner:1) (_dummy:0)) +(+_isTemplate:n)

Before a request with : “climate” and “rainfall” create the lucene query :

+((+keyword:climate +keyword:rainfall)) +eastBL:[181 TO 540] +westBL:[180 TO
539] +northBL:[271 TO 450] +southBL:[270 TO 449] +(_op0:1) +_isTemplate:n

I am not a specialist of lucene, i do the code modification looking at the
lucene documentation, so i put the two requests in order specialists can
comment the modifications and missing.

But for the moment doing some try i have no problem and what i do seems to
work correctly.

Here the code modifications maybe is is better to send the complete file, i
try to put it in attachment of the post

http://www.nabble.com/file/p14733549/LuceneSearcher.java LuceneSearcher.java

I use is the 2.1 source file as base
I'am waiting about your comments.

See you Fabien Bachraty

The Diff are :
--
import java.util.List;
import java.util.ArrayList;
--
//Fabien Bachraty : surcharge for the thesaurus narrowed broader related
management
  public static Query makeQuery(Element xmlQuery) throws Exception
  {
    return makeQuery(xmlQuery, null,false,false);
  }
--
//Fabien Bachraty return the list of narrower broader related keyword
  public static List getKeywordNBR(String Keyword , ServiceContext srvContext
) throws Exception
  {
     //list of related narrower and broader keyword
     List listRes = new ArrayList();

    //get the list of existing thesaurus using the List service
     org.fao.geonet.services.thesaurus.List Thesauruslist = new
org.fao.geonet.services.thesaurus.List();
     Element ParamsThesaurusList = new Element("request");
     ParamsThesaurusList.addContent(new
Element(Params.TYPE).setText("all-thesauri"));
     Element elThesaurusList = Thesauruslist.exec(ParamsThesaurusList,
srvContext);
     //i am not good with xml it's certainly possible to optimise the code
here
     elThesaurusList = elThesaurusList.getChild("thesaurusList");
     for (Iterator iterDirectory =
elThesaurusList.getChildren("directory").iterator();
iterDirectory.hasNext(); )
     {
        Element xmlDirectory = (Element)iterDirectory.next();
        for (Iterator iterThesaurus =
xmlDirectory.getChildren("thesaurus").iterator(); iterThesaurus.hasNext(); )
        {
            Element xmlThesaurus = (Element)iterThesaurus.next();
          String lvalue = xmlThesaurus.getAttributeValue("value");

          //for each thesaurus try to get the identifiant for the themekey
          org.fao.geonet.services.thesaurus.GetKeywords getKeywords = new
org.fao.geonet.services.thesaurus.GetKeywords();
          Element ParamsKeywords = new Element("request");

          ParamsKeywords.addContent(new Element("pKeyword").setText(Keyword));
          ParamsKeywords.addContent(new Element("pThesauri").setText(lvalue));
          ParamsKeywords.addContent(new Element("pTypeSearch").setText("2"));
          ParamsKeywords.addContent(new Element("nbResults").setText("100"));
          ParamsKeywords.addContent(new Element("pNewSearch").setText("true"));
          ParamsKeywords.addContent(new Element("pMode").setText("consult"));

          Element elKeywords = getKeywords.exec(ParamsKeywords, srvContext);
          //parse the result to find the identifiant
          Element elDescKey = elKeywords.getChild("descKeys");
          for (Iterator iterKeyword =
elDescKey.getChildren("keyword").iterator(); iterKeyword.hasNext(); )
          {
            Element xmlKeyword = (Element)iterKeyword.next();
            Element xmlUri = xmlKeyword.getChild("uri");
            String luri = xmlUri.getValue();
            //for each keyword identifiant find the Broader Narrower related with
service editelement
            //use the EditElementService to get broader narrowed related
            org.fao.geonet.services.thesaurus.EditElement BRNElement = new
org.fao.geonet.services.thesaurus.EditElement();
            Element ParamsBRN = new Element("request");
              ParamsBRN.addContent(new Element("uri").setText(luri));
              ParamsBRN.addContent(new Element("ref").setText(lvalue));
              ParamsBRN.addContent(new Element("mode").setText("consult"));
            Element elThesaurusBRN = BRNElement.exec(ParamsBRN, srvContext);
            //get node for the broader related narrowed
            Element elBroader = elThesaurusBRN.getChild("broader");
            Element elRelated = elThesaurusBRN.getChild("related");
            Element elNarrower = elThesaurusBRN.getChild("narrower");
            //for each node get the xml keywords part
            Element xmlBroader = elBroader.getChild("descKeys");
            Element xmlRelated = elRelated.getChild("descKeys");
            Element xmlNarrower = elNarrower.getChild("descKeys");
            //for each xml content get the keywords
            getKeywordFromElement(xmlBroader,listRes);
            getKeywordFromElement(xmlRelated,listRes);
            getKeywordFromElement(xmlNarrower,listRes);
          }
        }
     }
    return ( listRes );
  }
--
  //Fabien Bachraty : simple fonction to parse narrower broader related xml
element
  public static void getKeywordFromElement(Element xmlElement , List TheList
) throws Exception
  {
      for (Iterator iterKeyword =
xmlElement.getChildren("keyword").iterator(); iterKeyword.hasNext(); )
      {
        Element xmlKeyword = (Element)iterKeyword.next();
        Element elKeywordValue = xmlKeyword.getChild("value");
        String KeywordValue = elKeywordValue.getValue();
        TheList.add(new Term("keyword", KeywordValue.toLowerCase() ));
      }
  }
--
  // Fabien Bachraty : Change to makes a new lucene query with thesaurus
broader narrower related term
  // converts to lowercase if needed as the StandardAnalyzer
  public static Query makeQuery(Element xmlQuery, ServiceContext srvContext
, boolean Looprequired , boolean Loopprohibited ) throws Exception
  {
    String name = xmlQuery.getName();
    if (name.equals("TermQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      //Start FBachraty Thesaurus Narrower Broader Related Modification
      //create the request to return
      BooleanQuery tmpQuery = new BooleanQuery();
      List listRes = new ArrayList();
      listRes.add( new Term(fld, txt) );
      if (fld.equals("keyword") )
      {
        listRes.addAll( getKeywordNBR( txt , srvContext ) );
      }
      Iterator i = listRes.iterator();
          while (i.hasNext())
          {
            tmpQuery.add(new TermQuery((Term) i.next() ), Looprequired,
Loopprohibited);
          }
      return ( tmpQuery ) ;
      //End FBachraty Thesaurus Narrower Broader Related Modification
    }
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }
    else if (name.equals("PrefixQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new PrefixQuery(new Term(fld, txt));
    }
    else if (name.equals("WildcardQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new WildcardQuery(new Term(fld, txt));
    }
    else if (name.equals("PhraseQuery"))
    {
      PhraseQuery query = new PhraseQuery();
      for (Iterator iter = xmlQuery.getChildren().iterator(); iter.hasNext(); )
      {
        Element xmlTerm = (Element)iter.next();
        String fld = xmlTerm.getAttributeValue("fld");
        String txt = xmlTerm.getAttributeValue("txt").toLowerCase();
        query.add(new Term(fld, txt));
      }
      return query;
    }
    else if (name.equals("RangeQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String lowerTxt = xmlQuery.getAttributeValue("lowerTxt");
      String upperTxt = xmlQuery.getAttributeValue("upperTxt");
      String sInclusive = xmlQuery.getAttributeValue("inclusive");
      boolean inclusive = "true".equals(sInclusive);

      Term lowerTerm = (lowerTxt == null ? null : new Term(fld,
lowerTxt.toLowerCase()));
      Term upperTerm = (upperTxt == null ? null : new Term(fld,
upperTxt.toLowerCase()));

      return new RangeQuery(lowerTerm, upperTerm, inclusive);
    }
    else if (name.equals("BooleanQuery"))
    {
      BooleanQuery query = new BooleanQuery();
      for (Iterator iter = xmlQuery.getChildren().iterator(); iter.hasNext(); )
      {
        Element xmlBooleanClause = (Element)iter.next();
        String sRequired = xmlBooleanClause.getAttributeValue("required");
        String sProhibited = xmlBooleanClause.getAttributeValue("prohibited");
        boolean required = sRequired != null && sRequired.equals("true");
        boolean prohibited = sProhibited != null && sProhibited.equals("true");
        Element xmlSubQuery = (Element)xmlBooleanClause.getChildren().get(0);
        query.add(makeQuery(xmlSubQuery,srvContext,required,prohibited),
required, prohibited);
      }
      query.setMaxClauseCount(16384); // FIXME: quick fix; using Filters should
be better
      return query;
    }
    else
      throw new Exception("unknown lucene query type: " + name);
  }

--