[GeoNetwork-users] Question about the thesaurus use.

FBachraty · November 28, 2007, 12:46pm

Hi all,

I have read the administrator documentation and i have done a little focus
on the forum to find complementary information.

But unfortunately some questions stay unclear for myself.

If i understand well thesauri is a great opportunity for interoperability
and search.
It will let the search more efficient because it will manage the narrowed
and broader words and also the translation of keywords, and it avoid also
the majority of human typing in error.

So i have create a personal thesauri to try it

as example :

<skos:Concept rdf:about="http://evoltree.org/EVOLTREE#103000">
    <skos:prefLabel xml:lang="en">Species</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Espèces</skos:prefLabel>
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#103001" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#103001">
    <skos:prefLabel xml:lang="en">Abies Alba</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Abies Alba</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#103000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>

1 )If i put the rdf file on the external thesaurus the consultation work and
i see the word linked beetween their meaning.

But if i put the rdf file on the internal thesaurus the consultation appear
as a simple flat list of words.
and the edition do not permit to link words with other narrowed or broader
concepts.

Did i miss something ?
Will the future release will integrate a real thesauri editor that will
manage the link between concepts ?

2 ) Anyway i put my rdf on the external directory to see the link between
the words.
Then i create a metadata with keywords of my own thesauri.

2.1) Even if i choose the type “discipline” including only my own thesauri,
when i type in the keywords the interface display all keywords coming from
all thesauri like “place thesauri” region.rdf as example. Here i do not
understand the interest of this “thesauri type” field.

2.2) So i create a metadata with the keyword “Abies alba” coming from the
broader word “Species”
and i save it. After when i went to the advanced search i am very happy to
see that the keyword is indexed and referenced on the search keyword list,
but the list fastly become unreadable when so many metadata are saved.

Personally i think the presentation used during the creation with a simple
list filtered in real time with what is typed is more easy to use.
Maybe a special list that will display the broader and narrowed elements of
the selected one on the list will be the best for search and creation
Because sometime a word is used in different way and we do not know the one
we need.
As example “Species” can be the broader word of “Abies alba” but it can also
be the narrowed word of “information level” and the concept is not the same.

  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104000">
    <skos:prefLabel xml:lang="en">Information level</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Niveau d'information</skos:prefLabel>
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104001" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104002" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104003" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104004" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104005" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104001">
    <skos:prefLabel xml:lang="en">Community</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Communauté</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104002">
    <skos:prefLabel xml:lang="en">Species</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Espèce</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104003">
    <skos:prefLabel xml:lang="en">Population</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Population</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104004">
    <skos:prefLabel xml:lang="en">Individual</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Individu</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104005">
    <skos:prefLabel xml:lang="en">Tissues</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Tissus</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>

Sure that my 2.2 is not a question it is more an opinion to improve our use
of the powerful thesaurus functionality.

2.3) So if i continue with my example and i try to do a search with the
keyword “Species”. i wish i will find my metadata with the narrowed keyword
“Abies alba” but i don't even if i take the search accuracy to imprecise.

2.4) Lastly i choose the French interface i wish i can see on the advanced
search the keyword list with my available index translated, but i don't. And
if i make the search with keyword of my thesauri in French the metadata are
not find i have to put the keyword in the language used during the metadata
creation.

I understand that keywords are recorded in full text and not with thesauri
id, but either i miss something important or it will be very good that the
management of thesauri can manage the narrowed search and the translation.

I have read what said François-Xavier PRUNAYRE Jan 05, 2007; 05:19pm but it
is nearly one year ago.
http://www.nabble.com/forum/ViewPost.jtp?post=8975341&framed=y&skin=18419
So now what is planned ?

3) On the same post i have read that it is possible to integrate GEMET
thesauri.
This is the one i must integrate to fit the INSPIRE directive.

I have find RDFs file here :
http://www.eionet.europa.eu/gemet/rdf?langcode=en

But that don't work, where can i find a compatible GEMET thesauri ?

Thank you very much for you help and all your explanations.
Fabien Bachraty
--
View this message in context: http://www.nabble.com/Question-about-the-thesaurus-use.-tf4888331s18419.html#a13991690
Sent from the geonetwork-users mailing list archive at Nabble.com.

Francois-Xavier_Prun · November 28, 2007, 1:17pm

See comments below.

FBachraty wrote:

Hi all,

I have read the administrator documentation and i have done a little focus
on the forum to find complementary information.

But unfortunately some questions stay unclear for myself.

If i understand well thesauri is a great opportunity for interoperability
and search.
It will let the search more efficient because it will manage the narrowed
and broader words and also the translation of keywords, and it avoid also
the majority of human typing in error.

So i have create a personal thesauri to try it

as example :

<skos:Concept rdf:about="http://evoltree.org/EVOLTREE#103000">
    <skos:prefLabel xml:lang="en">Species</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Espèces</skos:prefLabel>
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#103001" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#103001">
    <skos:prefLabel xml:lang="en">Abies Alba</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Abies Alba</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#103000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>

1 )If i put the rdf file on the external thesaurus the consultation work and
i see the word linked beetween their meaning.

But if i put the rdf file on the internal thesaurus the consultation appear
as a simple flat list of words.
and the edition do not permit to link words with other narrowed or broader
concepts
Did i miss something ?

No, external import will support a thesaurus with
related/broader/narrower relationship, but the internal editor for
thesaurus will only allow creation of a flat list of words without
relationship.

Will the future release will integrate a real thesauri editor that will
manage the link between concepts ?

Not planned for now.

2 ) Anyway i put my rdf on the external directory to see the link between
the words.
Then i create a metadata with keywords of my own thesauri.

2.1) Even if i choose the type “discipline” including only my own thesauri,
when i type in the keywords the interface display all keywords coming from
all thesauri like “place thesauri” region.rdf as example. Here i do not
understand the interest of this “thesauri type” field.

There is no use for now of the type of keyword because the editor will
propose keywords comming from all thesaurus wihtout taking into account
the thesaurus type. This is the default behaviour for iso and dc keywords.

2.2) So i create a metadata with the keyword “Abies alba” coming from the
broader word “Species”
and i save it. After when i went to the advanced search i am very happy to
see that the keyword is indexed and referenced on the search keyword list,
but the list fastly become unreadable when so many metadata are saved.

That's an issue.

Personally i think the presentation used during the creation with a simple
list filtered in real time with what is typed is more easy to use.
Maybe a special list that will display the broader and narrowed elements of
the selected one on the list will be the best for search and creation
Because sometime a word is used in different way and we do not know the one
we need.

Yes that's the use of thesaurus, but quite a lot of work is required
into lucene to use thesaurus during the search.

As example “Species” can be the broader word of “Abies alba” but it can also
be the narrowed word of “information level” and the concept is not the same.

  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104000">
    <skos:prefLabel xml:lang="en">Information level</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Niveau d'information</skos:prefLabel>
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104001" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104002" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104003" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104004" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104005" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104001">
    <skos:prefLabel xml:lang="en">Community</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Communauté</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104002">
    <skos:prefLabel xml:lang="en">Species</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Espèce</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104003">
    <skos:prefLabel xml:lang="en">Population</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Population</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104004">
    <skos:prefLabel xml:lang="en">Individual</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Individu</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104005">
    <skos:prefLabel xml:lang="en">Tissues</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Tissus</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>

Sure that my 2.2 is not a question it is more an opinion to improve our use
of the powerful thesaurus functionality.

2.3) So if i continue with my example and i try to do a search with the
keyword “Species”. i wish i will find my metadata with the narrowed keyword
“Abies alba” but i don't even if i take the search accuracy to imprecise.

Lucene capability to be improved. not planned for now, but contribution
welcomed

2.4) Lastly i choose the French interface i wish i can see on the advanced
search the keyword list with my available index translated, but i don't. And
if i make the search with keyword of my thesauri in French the metadata are
not find i have to put the keyword in the language used during the metadata
creation.

Yes. Do you know any implementation of tools using ids instead of
concept strings ? basically, this is linked with use of xlink already
discuss on the list ... and this is not trivial !

I understand that keywords are recorded in full text and not with thesauri
id, but either i miss something important or it will be very good that the
management of thesauri can manage the narrowed search and the translation.

I have read what said François-Xavier PRUNAYRE Jan 05, 2007; 05:19pm but it
is nearly one year ago.
http://www.nabble.com/forum/ViewPost.jtp?post=8975341&framed=y&skin=18419
So now what is planned ?

No funding support for that for the time being from my side ...

3) On the same post i have read that it is possible to integrate GEMET
thesauri.
This is the one i must integrate to fit the INSPIRE directive.

I have find RDFs file here :
http://www.eionet.europa.eu/gemet/rdf?langcode=en

But that don't work, where can i find a compatible GEMET thesauri ?

GEMET is available in RDF/SKOS (in 3 files I think) but require to fit
into the SKOS structure used by GeoNetwork with all in one file :
concept definition, relationship (eg. agrovoc). I've not made the work
but should be not so difficult.

Ciao. Francois

Thank you very much for you help and all your explanations.
Fabien Bachraty

FBachraty · November 28, 2007, 1:57pm

So i expect lucene specialist look at the mailling list

Thank you very much for your answers François-Xavier

Francois-Xavier Prunayre-2 wrote:

See comments below.

FBachraty wrote:

Hi all,

I have read the administrator documentation and i have done a little
focus
on the forum to find complementary information.

But unfortunately some questions stay unclear for myself.

If i understand well thesauri is a great opportunity for interoperability
and search.
It will let the search more efficient because it will manage the narrowed
and broader words and also the translation of keywords, and it avoid also
the majority of human typing in error.

So i have create a personal thesauri to try it

as example :

<skos:Concept rdf:about="http://evoltree.org/EVOLTREE#103000">
    <skos:prefLabel xml:lang="en">Species</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Espèces</skos:prefLabel>
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#103001" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#103001">
    <skos:prefLabel xml:lang="en">Abies Alba</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Abies Alba</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#103000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>

1 )If i put the rdf file on the external thesaurus the consultation work
and
i see the word linked beetween their meaning.

But if i put the rdf file on the internal thesaurus the consultation
appear
as a simple flat list of words.
and the edition do not permit to link words with other narrowed or
broader
concepts
Did i miss something ?


No, external import will support a thesaurus with
related/broader/narrower relationship, but the internal editor for
thesaurus will only allow creation of a flat list of words without
relationship.

Will the future release will integrate a real thesauri editor that will
manage the link between concepts ?


Not planned for now.

2 ) Anyway i put my rdf on the external directory to see the link between
the words.
Then i create a metadata with keywords of my own thesauri.

2.1) Even if i choose the type “discipline” including only my own
thesauri,
when i type in the keywords the interface display all keywords coming
from
all thesauri like “place thesauri” region.rdf as example. Here i do not
understand the interest of this “thesauri type” field.


There is no use for now of the type of keyword because the editor will
propose keywords comming from all thesaurus wihtout taking into account
the thesaurus type. This is the default behaviour for iso and dc keywords.

2.2) So i create a metadata with the keyword “Abies alba” coming from the
broader word “Species”
and i save it. After when i went to the advanced search i am very happy
to
see that the keyword is indexed and referenced on the search keyword
list,
but the list fastly become unreadable when so many metadata are saved.


That's an issue.

Personally i think the presentation used during the creation with a
simple
list filtered in real time with what is typed is more easy to use.
Maybe a special list that will display the broader and narrowed elements
of
the selected one on the list will be the best for search and creation
Because sometime a word is used in different way and we do not know the
one
we need.


Yes that's the use of thesaurus, but quite a lot of work is required
into lucene to use thesaurus during the search.

As example “Species” can be the broader word of “Abies alba” but it can
also
be the narrowed word of “information level” and the concept is not the
same.

  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104000">
    <skos:prefLabel xml:lang="en">Information level</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Niveau d'information</skos:prefLabel>
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104001" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104002" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104003" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104004" />
    <skos:narrower rdf:resource="http://evoltree.org/EVOLTREE#104005" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104001">
    <skos:prefLabel xml:lang="en">Community</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Communauté</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104002">
    <skos:prefLabel xml:lang="en">Species</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Espèce</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104003">
    <skos:prefLabel xml:lang="en">Population</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Population</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104004">
    <skos:prefLabel xml:lang="en">Individual</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Individu</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>
  <skos:Concept rdf:about="http://evoltree.org/EVOLTREE#104005">
    <skos:prefLabel xml:lang="en">Tissues</skos:prefLabel>
    <skos:prefLabel xml:lang="fr">Tissus</skos:prefLabel>
    <skos:broader rdf:resource="http://evoltree.org/EVOLTREE#104000" />
    <skos:inScheme rdf:resource="http://evoltree.org/EVOLTREE" />
  </skos:Concept>

Sure that my 2.2 is not a question it is more an opinion to improve our
use
of the powerful thesaurus functionality.

2.3) So if i continue with my example and i try to do a search with the
keyword “Species”. i wish i will find my metadata with the narrowed
keyword
“Abies alba” but i don't even if i take the search accuracy to imprecise.


Lucene capability to be improved. not planned for now, but contribution
welcomed

2.4) Lastly i choose the French interface i wish i can see on the
advanced
search the keyword list with my available index translated, but i don't.
And
if i make the search with keyword of my thesauri in French the metadata
are
not find i have to put the keyword in the language used during the
metadata
creation.


Yes. Do you know any implementation of tools using ids instead of
concept strings ? basically, this is linked with use of xlink already
discuss on the list ... and this is not trivial !

I understand that keywords are recorded in full text and not with
thesauri
id, but either i miss something important or it will be very good that
the
management of thesauri can manage the narrowed search and the
translation.

I have read what said François-Xavier PRUNAYRE Jan 05, 2007; 05:19pm but
it
is nearly one year ago.
http://www.nabble.com/forum/ViewPost.jtp?post=8975341&framed=y&skin=18419
So now what is planned ?


No funding support for that for the time being from my side ...

3) On the same post i have read that it is possible to integrate GEMET
thesauri.
This is the one i must integrate to fit the INSPIRE directive.

I have find RDFs file here :
http://www.eionet.europa.eu/gemet/rdf?langcode=en

But that don't work, where can i find a compatible GEMET thesauri ?


GEMET is available in RDF/SKOS (in 3 files I think) but require to fit
into the SKOS structure used by GeoNetwork with all in one file :
concept definition, relationship (eg. agrovoc). I've not made the work
but should be not so difficult.

Ciao. Francois

Thank you very much for you help and all your explanations.
Fabien Bachraty

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
View this message in context: http://www.nabble.com/Question-about-the-thesaurus-use.-tf4888331s18419.html#a13992966
Sent from the geonetwork-users mailing list archive at Nabble.com.

FBachraty · January 10, 2008, 1:46pm

Hello,

In order to let the search keyword working with thesaurus, I have update the
code in the LuceneSearcher.java file.

I am starting with java so my code can be optimised, but what i do work.

Now it's possible to find Broader, Narrowed and ,Related term automatically
on the Advanced Search.

What i do is simple :

For each keyword typed between '' '' (themekey / Lucene TermQuery ), i look
on all thesaurus available on GN node, and then I find corresponding
keywords identification.
Finally with the keyword identification I find the Broader, Related,
Narrowed words and i add them to the Lucene query.

I do not manage for the moment the ( themekey / Fuzzyquery ) but i think
that the same way can be used.

As example i have a thesaurus with Climate and Broader term like rainfall.
If i put ''Climate'' on the keyword search the request will be automatically
and internally transform as :
''Climate'' or ''Rainfall'' or ''Frost'' or ''...''.

the “climate” search create the lucene query above :

+((keyword:climate keyword:frost keyword:radiation keyword:rainfall
keyword:temperature)) +eastBL:[181 TO 540] +westBL:[180 TO 539]
+northBL:[271 TO 450] +southBL:[270 TO 449] +((_op0:3) (_op0:2) (_op0:0)
(_op0:4) (_op0:1) (_owner:1) (_dummy:0)) +(+_isTemplate:n)

Before a request with : “climate” and “rainfall” create the lucene query :

+((+keyword:climate +keyword:rainfall)) +eastBL:[181 TO 540] +westBL:[180 TO
539] +northBL:[271 TO 450] +southBL:[270 TO 449] +(_op0:1) +_isTemplate:n

I am not a specialist of lucene, i do the code modification looking at the
lucene documentation, so i put the two requests in order specialists can
comment the modifications and missing.

But for the moment doing some try i have no problem and what i do seems to
work correctly.

Here the code modifications maybe is is better to send the complete file, i
try to put it in attachment of the post

http://www.nabble.com/file/p14733549/LuceneSearcher.java LuceneSearcher.java

I use is the 2.1 source file as base
I'am waiting about your comments.

See you Fabien Bachraty

The Diff are :
--
import java.util.List;
import java.util.ArrayList;
--
//Fabien Bachraty : surcharge for the thesaurus narrowed broader related
management
  public static Query makeQuery(Element xmlQuery) throws Exception
  {
    return makeQuery(xmlQuery, null,false,false);
  }
--
//Fabien Bachraty return the list of narrower broader related keyword
  public static List getKeywordNBR(String Keyword , ServiceContext srvContext
) throws Exception
  {
     //list of related narrower and broader keyword
     List listRes = new ArrayList();

    //get the list of existing thesaurus using the List service
     org.fao.geonet.services.thesaurus.List Thesauruslist = new
org.fao.geonet.services.thesaurus.List();
     Element ParamsThesaurusList = new Element("request");
     ParamsThesaurusList.addContent(new
Element(Params.TYPE).setText("all-thesauri"));
     Element elThesaurusList = Thesauruslist.exec(ParamsThesaurusList,
srvContext);
     //i am not good with xml it's certainly possible to optimise the code
here
     elThesaurusList = elThesaurusList.getChild("thesaurusList");
     for (Iterator iterDirectory =
elThesaurusList.getChildren("directory").iterator();
iterDirectory.hasNext(); )
     {
        Element xmlDirectory = (Element)iterDirectory.next();
        for (Iterator iterThesaurus =
xmlDirectory.getChildren("thesaurus").iterator(); iterThesaurus.hasNext(); )
        {
            Element xmlThesaurus = (Element)iterThesaurus.next();
          String lvalue = xmlThesaurus.getAttributeValue("value");

          //for each thesaurus try to get the identifiant for the themekey
          org.fao.geonet.services.thesaurus.GetKeywords getKeywords = new
org.fao.geonet.services.thesaurus.GetKeywords();
          Element ParamsKeywords = new Element("request");

          ParamsKeywords.addContent(new Element("pKeyword").setText(Keyword));
          ParamsKeywords.addContent(new Element("pThesauri").setText(lvalue));
          ParamsKeywords.addContent(new Element("pTypeSearch").setText("2"));
          ParamsKeywords.addContent(new Element("nbResults").setText("100"));
          ParamsKeywords.addContent(new Element("pNewSearch").setText("true"));
          ParamsKeywords.addContent(new Element("pMode").setText("consult"));

          Element elKeywords = getKeywords.exec(ParamsKeywords, srvContext);
          //parse the result to find the identifiant
          Element elDescKey = elKeywords.getChild("descKeys");
          for (Iterator iterKeyword =
elDescKey.getChildren("keyword").iterator(); iterKeyword.hasNext(); )
          {
            Element xmlKeyword = (Element)iterKeyword.next();
            Element xmlUri = xmlKeyword.getChild("uri");
            String luri = xmlUri.getValue();
            //for each keyword identifiant find the Broader Narrower related with
service editelement
            //use the EditElementService to get broader narrowed related
            org.fao.geonet.services.thesaurus.EditElement BRNElement = new
org.fao.geonet.services.thesaurus.EditElement();
            Element ParamsBRN = new Element("request");
              ParamsBRN.addContent(new Element("uri").setText(luri));
              ParamsBRN.addContent(new Element("ref").setText(lvalue));
              ParamsBRN.addContent(new Element("mode").setText("consult"));
            Element elThesaurusBRN = BRNElement.exec(ParamsBRN, srvContext);
            //get node for the broader related narrowed
            Element elBroader = elThesaurusBRN.getChild("broader");
            Element elRelated = elThesaurusBRN.getChild("related");
            Element elNarrower = elThesaurusBRN.getChild("narrower");
            //for each node get the xml keywords part
            Element xmlBroader = elBroader.getChild("descKeys");
            Element xmlRelated = elRelated.getChild("descKeys");
            Element xmlNarrower = elNarrower.getChild("descKeys");
            //for each xml content get the keywords
            getKeywordFromElement(xmlBroader,listRes);
            getKeywordFromElement(xmlRelated,listRes);
            getKeywordFromElement(xmlNarrower,listRes);
          }
        }
     }
    return ( listRes );
  }
--
  //Fabien Bachraty : simple fonction to parse narrower broader related xml
element
  public static void getKeywordFromElement(Element xmlElement , List TheList
) throws Exception
  {
      for (Iterator iterKeyword =
xmlElement.getChildren("keyword").iterator(); iterKeyword.hasNext(); )
      {
        Element xmlKeyword = (Element)iterKeyword.next();
        Element elKeywordValue = xmlKeyword.getChild("value");
        String KeywordValue = elKeywordValue.getValue();
        TheList.add(new Term("keyword", KeywordValue.toLowerCase() ));
      }
  }
--
  // Fabien Bachraty : Change to makes a new lucene query with thesaurus
broader narrower related term
  // converts to lowercase if needed as the StandardAnalyzer
  public static Query makeQuery(Element xmlQuery, ServiceContext srvContext
, boolean Looprequired , boolean Loopprohibited ) throws Exception
  {
    String name = xmlQuery.getName();
    if (name.equals("TermQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      //Start FBachraty Thesaurus Narrower Broader Related Modification
      //create the request to return
      BooleanQuery tmpQuery = new BooleanQuery();
      List listRes = new ArrayList();
      listRes.add( new Term(fld, txt) );
      if (fld.equals("keyword") )
      {
        listRes.addAll( getKeywordNBR( txt , srvContext ) );
      }
      Iterator i = listRes.iterator();
          while (i.hasNext())
          {
            tmpQuery.add(new TermQuery((Term) i.next() ), Looprequired,
Loopprohibited);
          }
      return ( tmpQuery ) ;
      //End FBachraty Thesaurus Narrower Broader Related Modification
    }
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }
    else if (name.equals("PrefixQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new PrefixQuery(new Term(fld, txt));
    }
    else if (name.equals("WildcardQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new WildcardQuery(new Term(fld, txt));
    }
    else if (name.equals("PhraseQuery"))
    {
      PhraseQuery query = new PhraseQuery();
      for (Iterator iter = xmlQuery.getChildren().iterator(); iter.hasNext(); )
      {
        Element xmlTerm = (Element)iter.next();
        String fld = xmlTerm.getAttributeValue("fld");
        String txt = xmlTerm.getAttributeValue("txt").toLowerCase();
        query.add(new Term(fld, txt));
      }
      return query;
    }
    else if (name.equals("RangeQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String lowerTxt = xmlQuery.getAttributeValue("lowerTxt");
      String upperTxt = xmlQuery.getAttributeValue("upperTxt");
      String sInclusive = xmlQuery.getAttributeValue("inclusive");
      boolean inclusive = "true".equals(sInclusive);

Term lowerTerm = (lowerTxt == null ? null : new Term(fld,
lowerTxt.toLowerCase()));
Term upperTerm = (upperTxt == null ? null : new Term(fld,
upperTxt.toLowerCase()));

      return new RangeQuery(lowerTerm, upperTerm, inclusive);
    }
    else if (name.equals("BooleanQuery"))
    {
      BooleanQuery query = new BooleanQuery();
      for (Iterator iter = xmlQuery.getChildren().iterator(); iter.hasNext(); )
      {
        Element xmlBooleanClause = (Element)iter.next();
        String sRequired = xmlBooleanClause.getAttributeValue("required");
        String sProhibited = xmlBooleanClause.getAttributeValue("prohibited");
        boolean required = sRequired != null && sRequired.equals("true");
        boolean prohibited = sProhibited != null && sProhibited.equals("true");
        Element xmlSubQuery = (Element)xmlBooleanClause.getChildren().get(0);
        query.add(makeQuery(xmlSubQuery,srvContext,required,prohibited),
required, prohibited);
      }
      query.setMaxClauseCount(16384); // FIXME: quick fix; using Filters should
be better
      return query;
    }
    else
      throw new Exception("unknown lucene query type: " + name);
  }
--
View this message in context: http://www.nabble.com/Question-about-the-thesaurus-use.-tp13991690s18419p14733549.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Francois-Xavier_Prun · January 11, 2008, 8:29am

Hi Fabien,

You're option seems to be a good alternative. We discuss that also with
Jeroen, last month I guess, and the other option is to add the
broader/narrower/related term to the lucene index when indexing metadata
instead of adding them at search time.
It should be better because indexing metadata is happening only on
update, and search performance could be better that way. Problem could
be the size of lucene index ...

Maybe a third option is to learn to lucene where is the thesaurus and
maybe take that thesaurus into account for search ... but I'm not sure
there's such functionnality in Lucene ...

An other issue is multilingual thesaurus. Fabien you're working on the
INSPIRE context are you facing issue on that point also ? Which
thesaurus are you using ? GEMET ?
Other problem is storing keywords or keyword identifier....

Maybe it's time to open a discussion on the R&D trac on keyword and
thesaurus improvement and organize all inputs on that topic.

Francois.

FBachraty wrote:

Hello,

In order to let the search keyword working with thesaurus, I have update the
code in the LuceneSearcher.java file.

I am starting with java so my code can be optimised, but what i do work.

Now it's possible to find Broader, Narrowed and ,Related term automatically
on the Advanced Search.

What i do is simple :

For each keyword typed between '' '' (themekey / Lucene TermQuery ), i look
on all thesaurus available on GN node, and then I find corresponding
keywords identification.
Finally with the keyword identification I find the Broader, Related,
Narrowed words and i add them to the Lucene query.

I do not manage for the moment the ( themekey / Fuzzyquery ) but i think
that the same way can be used.

As example i have a thesaurus with Climate and Broader term like rainfall.
If i put ''Climate'' on the keyword search the request will be automatically
and internally transform as :
''Climate'' or ''Rainfall'' or ''Frost'' or ''...''.

the “climate” search create the lucene query above :

+((keyword:climate keyword:frost keyword:radiation keyword:rainfall
keyword:temperature)) +eastBL:[181 TO 540] +westBL:[180 TO 539]
+northBL:[271 TO 450] +southBL:[270 TO 449] +((_op0:3) (_op0:2) (_op0:0)
(_op0:4) (_op0:1) (_owner:1) (_dummy:0)) +(+_isTemplate:n)

Before a request with : “climate” and “rainfall” create the lucene query :

+((+keyword:climate +keyword:rainfall)) +eastBL:[181 TO 540] +westBL:[180 TO
539] +northBL:[271 TO 450] +southBL:[270 TO 449] +(_op0:1) +_isTemplate:n

I am not a specialist of lucene, i do the code modification looking at the
lucene documentation, so i put the two requests in order specialists can
comment the modifications and missing.

But for the moment doing some try i have no problem and what i do seems to
work correctly.

Here the code modifications maybe is is better to send the complete file, i
try to put it in attachment of the post

http://www.nabble.com/file/p14733549/LuceneSearcher.java LuceneSearcher.java

I use is the 2.1 source file as base
I'am waiting about your comments.

See you Fabien Bachraty

The Diff are :
--
import java.util.List;
import java.util.ArrayList;
--
//Fabien Bachraty : surcharge for the thesaurus narrowed broader related
management
  public static Query makeQuery(Element xmlQuery) throws Exception
  {
    return makeQuery(xmlQuery, null,false,false);
  }
--
//Fabien Bachraty return the list of narrower broader related keyword
  public static List getKeywordNBR(String Keyword , ServiceContext srvContext
) throws Exception
  {
     //list of related narrower and broader keyword
     List listRes = new ArrayList();

    //get the list of existing thesaurus using the List service
     org.fao.geonet.services.thesaurus.List Thesauruslist = new
org.fao.geonet.services.thesaurus.List();
     Element ParamsThesaurusList = new Element("request");
     ParamsThesaurusList.addContent(new
Element(Params.TYPE).setText("all-thesauri"));
     Element elThesaurusList = Thesauruslist.exec(ParamsThesaurusList,
srvContext);
     //i am not good with xml it's certainly possible to optimise the code
here
     elThesaurusList = elThesaurusList.getChild("thesaurusList");
     for (Iterator iterDirectory =
elThesaurusList.getChildren("directory").iterator();
iterDirectory.hasNext(); )
     {
        Element xmlDirectory = (Element)iterDirectory.next();
        for (Iterator iterThesaurus =
xmlDirectory.getChildren("thesaurus").iterator(); iterThesaurus.hasNext(); )
        {
            Element xmlThesaurus = (Element)iterThesaurus.next();
          String lvalue = xmlThesaurus.getAttributeValue("value");

          //for each thesaurus try to get the identifiant for the themekey
          org.fao.geonet.services.thesaurus.GetKeywords getKeywords = new
org.fao.geonet.services.thesaurus.GetKeywords();
          Element ParamsKeywords = new Element("request");

          ParamsKeywords.addContent(new Element("pKeyword").setText(Keyword));
          ParamsKeywords.addContent(new Element("pThesauri").setText(lvalue));
          ParamsKeywords.addContent(new Element("pTypeSearch").setText("2"));
          ParamsKeywords.addContent(new Element("nbResults").setText("100"));
          ParamsKeywords.addContent(new Element("pNewSearch").setText("true"));
          ParamsKeywords.addContent(new Element("pMode").setText("consult"));

          Element elKeywords = getKeywords.exec(ParamsKeywords, srvContext);
          //parse the result to find the identifiant
          Element elDescKey = elKeywords.getChild("descKeys");
          for (Iterator iterKeyword =
elDescKey.getChildren("keyword").iterator(); iterKeyword.hasNext(); )
          {
            Element xmlKeyword = (Element)iterKeyword.next();
            Element xmlUri = xmlKeyword.getChild("uri");
            String luri = xmlUri.getValue();
            //for each keyword identifiant find the Broader Narrower related with
service editelement
            //use the EditElementService to get broader narrowed related
            org.fao.geonet.services.thesaurus.EditElement BRNElement = new
org.fao.geonet.services.thesaurus.EditElement();
            Element ParamsBRN = new Element("request");
              ParamsBRN.addContent(new Element("uri").setText(luri));
              ParamsBRN.addContent(new Element("ref").setText(lvalue));
              ParamsBRN.addContent(new Element("mode").setText("consult"));
            Element elThesaurusBRN = BRNElement.exec(ParamsBRN, srvContext);
            //get node for the broader related narrowed
            Element elBroader = elThesaurusBRN.getChild("broader");
            Element elRelated = elThesaurusBRN.getChild("related");
            Element elNarrower = elThesaurusBRN.getChild("narrower");
            //for each node get the xml keywords part
            Element xmlBroader = elBroader.getChild("descKeys");
            Element xmlRelated = elRelated.getChild("descKeys");
            Element xmlNarrower = elNarrower.getChild("descKeys");
            //for each xml content get the keywords
            getKeywordFromElement(xmlBroader,listRes);
            getKeywordFromElement(xmlRelated,listRes);
            getKeywordFromElement(xmlNarrower,listRes);
          }
        }
     }
    return ( listRes );
  }
--
  //Fabien Bachraty : simple fonction to parse narrower broader related xml
element
  public static void getKeywordFromElement(Element xmlElement , List TheList
) throws Exception
  {
      for (Iterator iterKeyword =
xmlElement.getChildren("keyword").iterator(); iterKeyword.hasNext(); )
      {
        Element xmlKeyword = (Element)iterKeyword.next();
        Element elKeywordValue = xmlKeyword.getChild("value");
        String KeywordValue = elKeywordValue.getValue();
        TheList.add(new Term("keyword", KeywordValue.toLowerCase() ));
      }
  }
--
  // Fabien Bachraty : Change to makes a new lucene query with thesaurus
broader narrower related term
  // converts to lowercase if needed as the StandardAnalyzer
  public static Query makeQuery(Element xmlQuery, ServiceContext srvContext
, boolean Looprequired , boolean Loopprohibited ) throws Exception
  {
    String name = xmlQuery.getName();
    if (name.equals("TermQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      //Start FBachraty Thesaurus Narrower Broader Related Modification
      //create the request to return
      BooleanQuery tmpQuery = new BooleanQuery();
      List listRes = new ArrayList();
      listRes.add( new Term(fld, txt) );
      if (fld.equals("keyword") )
      {
        listRes.addAll( getKeywordNBR( txt , srvContext ) );
      }
      Iterator i = listRes.iterator();
          while (i.hasNext())
          {
            tmpQuery.add(new TermQuery((Term) i.next() ), Looprequired,
Loopprohibited);
          }
      return ( tmpQuery ) ;
      //End FBachraty Thesaurus Narrower Broader Related Modification
    }
    else if (name.equals("FuzzyQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      Float sim = Float.valueOf(xmlQuery.getAttributeValue("sim"));
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new FuzzyQuery(new Term(fld, txt), sim.floatValue());
    }
    else if (name.equals("PrefixQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new PrefixQuery(new Term(fld, txt));
    }
    else if (name.equals("WildcardQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String txt = xmlQuery.getAttributeValue("txt").toLowerCase();
      return new WildcardQuery(new Term(fld, txt));
    }
    else if (name.equals("PhraseQuery"))
    {
      PhraseQuery query = new PhraseQuery();
      for (Iterator iter = xmlQuery.getChildren().iterator(); iter.hasNext(); )
      {
        Element xmlTerm = (Element)iter.next();
        String fld = xmlTerm.getAttributeValue("fld");
        String txt = xmlTerm.getAttributeValue("txt").toLowerCase();
        query.add(new Term(fld, txt));
      }
      return query;
    }
    else if (name.equals("RangeQuery"))
    {
      String fld = xmlQuery.getAttributeValue("fld");
      String lowerTxt = xmlQuery.getAttributeValue("lowerTxt");
      String upperTxt = xmlQuery.getAttributeValue("upperTxt");
      String sInclusive = xmlQuery.getAttributeValue("inclusive");
      boolean inclusive = "true".equals(sInclusive);

      Term lowerTerm = (lowerTxt == null ? null : new Term(fld,
lowerTxt.toLowerCase()));
      Term upperTerm = (upperTxt == null ? null : new Term(fld,
upperTxt.toLowerCase()));

      return new RangeQuery(lowerTerm, upperTerm, inclusive);
    }
    else if (name.equals("BooleanQuery"))
    {
      BooleanQuery query = new BooleanQuery();
      for (Iterator iter = xmlQuery.getChildren().iterator(); iter.hasNext(); )
      {
        Element xmlBooleanClause = (Element)iter.next();
        String sRequired = xmlBooleanClause.getAttributeValue("required");
        String sProhibited = xmlBooleanClause.getAttributeValue("prohibited");
        boolean required = sRequired != null && sRequired.equals("true");
        boolean prohibited = sProhibited != null && sProhibited.equals("true");
        Element xmlSubQuery = (Element)xmlBooleanClause.getChildren().get(0);
        query.add(makeQuery(xmlSubQuery,srvContext,required,prohibited),
required, prohibited);
      }
      query.setMaxClauseCount(16384); // FIXME: quick fix; using Filters should
be better
      return query;
    }
    else
      throw new Exception("unknown lucene query type: " + name);
  }

--

FBachraty · January 11, 2008, 9:30am

Hi Francois,

Francois-Xavier Prunayre-2 wrote:

You're option seems to be a good alternative. We discuss that also with
Jeroen, last month I guess, and the other option is to add the
broader/narrower/related term to the lucene index when indexing metadata
instead of adding them at search time.
It should be better because indexing metadata is happening only on
update, and search performance could be better that way. Problem could
be the size of lucene index ...

Sure it will be more performant at the lucene index level, but the index is
the same for all the language.
And at the search level the related / broader / narrowed term are linked
with the language of the session.

If lucene will also contain thesaurus translation here there will have a
size problem, moreover for keywords like ''Country'' And then it will
slow the global search functionality Isn't ?

Sure the solution i propose is not good It is just an alternative because my
code do not take care of the result ordering.
Result coming from related keyword can be displayed before result matching
original keyword ...
It also do not manage multilingual crossing search but that can be
corrected...

Francois-Xavier Prunayre-2 wrote:

Maybe a third option is to learn to lucene where is the thesaurus and
maybe take that thesaurus into account for search ... but I'm not sure
there's such functionnality in Lucene ...

Sure it will the best way, but as you, I don't see such functionnality in
Lucene
But some people have ever work on those problematic :
http://lucene-qe.sourceforge.net/

Francois-Xavier Prunayre-2 wrote:

An other issue is multilingual thesaurus. Fabien you're working on the
INSPIRE context are you facing issue on that point also ?

Yes i have to use the Thesaurus for many reason :
  Because of INSPIRE the use of keyword is mandatory to describe the dataset.
  Because of the context of my project (European project with multilingual
aspects)
  Because of the studies domain ( Environmental, Forest, Genetic, ...). We
wish to describe metadata with coherent and common keyword for all partners
and we also wish to make evolution on the keyword list.

A solution to avoid the multilingual aspect is maybe to store on metadata
and in lucene index the concept identification and not the keyword itself ?
I will avoid term confusion during the search because sometime a keyword is
present many time and on many thesaurus with different meaning and context.
And then the keyword will be also translate automatically during the search
and also on the metadata interface ( It's maybe a problem for harvesting
and Xml metadata export ? )

Francois-Xavier Prunayre-2 wrote:

Which thesaurus are you using ? GEMET ?

For Inspire i have to use the GEMET thesaurus . But for the moment i have
two problems :
I don't find how to convert the format distributed .rdf in accordance with
the .rdf file attended by GN.
The Gemet file i found was not multilingual.
( http://www.eionet.europa.eu/gemet/rdf?langcode=en )

For the moment i only use an internal thesaurus i create ( ~100 keyword ),
To describe our studies domain.

Francois-Xavier Prunayre-2 wrote:

Other problem is storing keywords or keyword identifier....

The Keyword identifier solution induce work but seams to be more powerfull.
(Multilingual management, Metadata Update, Thesaurus Update, ...)

Francois-Xavier Prunayre-2 wrote:

Maybe it's time to open a discussion on the R&D trac on keyword and
thesaurus improvement and organize all inputs on that topic.

Sorry for my ignorance but what is the R&D trac ?

Best regards,
Fabien.

--
View this message in context: http://www.nabble.com/Question-about-the-thesaurus-use.-tp13991690s18419p14752347.html
Sent from the geonetwork-users mailing list archive at Nabble.com.

Francois-Xavier_Prun · January 11, 2008, 9:49am

If lucene will also contain thesaurus translation here there will have a
size problem, moreover for keywords like ''Country'' And then it will
slow the global search functionality Isn't ?

Yes it could but it need to be tested. There's also tools based on
lucene which could help working on large index (eg. Solr could do
replication of index and caching search ...)

Sure the solution i propose is not good It is just an alternative because my
code do not take care of the result ordering.
Result coming from related keyword can be displayed before result matching
original keyword ...

On index creation you could probably define a "main" keyword and other
keyword which will be take into consideration for scoring by lucene.

A solution to avoid the multilingual aspect is maybe to store on metadata
and in lucene index the concept identification and not the keyword itself ?
I will avoid term confusion during the search because sometime a keyword is
present many time and on many thesaurus with different meaning and context.
And then the keyword will be also translate automatically during the search
and also on the metadata interface ( It's maybe a problem for harvesting
and Xml metadata export ? )

The problem is that we need to have access to the thesaurus used,
identifier should point to a valid internet resource (here we are back
to the xlink discussion :)) or we need to define a mechanism to resolve
keyword id replacing id by keyword name when users/harvesters access
metadata ...

Maybe it's time to open a discussion on the R&D trac on keyword and
thesaurus improvement and organize all inputs on that topic.

Sorry for my ignorance but what is the R&D trac ?

http://trac.osgeo.org/geonetwork/wiki/RnD here is the research and
development page for GeoNetwork. I made a page on thesaurus, feel free
to update the page.

Francois

Best regards,
Fabien.

PeterParslow · January 10, 2025, 10:04am

If anyone is still following this 17 year old thread, please look at Idea to improve keyword / thesaurus handling - GeoNetwork / GeoNetwork User - OSGeo Discourse