[GeoNetwork-devel] Keyword/Thesaurus work

Hi,

I wanted to notify all of you of work I have been doing to the Thesaurus and keywords. There are a few issues I have identified with the Thesaurus and Keyword code.

  1. It seems to have been added at some point and all development on it has been incremental updates.

  2. No documentation

  3. SERQL queries that are built rather half-hazardly through string concatenation

  4. Very poor multi-lingual support

  5. It is hard to track when 2 letter language codes are used and when 3 letter codes are used
    My work has been trying to address all of these issues. A summary of what has been done so far:

  6. Document every method in Thesaurus, KeywordBean and KeywordSearcher

  7. Add Unit tests for all methods in Thesaurus, KeywordSearcher and the new classes I created to support these classes

  8. Change Keyword bean to be multilingual. It has the concept of “Default” language still for backwards compatibility but it also has getValues and getDefinitions which are maps from language code (3 letter) to the value.

  9. Change Keyword bean to have a fluent interface so you can:

  • new KeywordBean().addValue(“eng”, “Water”).addValue(“ger”, “Wasser”).setCode(“http://geonetwork.net#water”)1. Changed Thesaurus so that the addElement and updateElement methods are deprecated with updateElement(KeywordBean) being the preference since Keyword bean has all the same information but handles codes and localization nicely as well.
  • thesaurus.addElement( new KeywordBean().setCode(“http://geonetwork.net#house”).addValue(“eng”,“house”))1. Added a small DSL for creating SERQL queries and getting results.
  • I now have a small generic DSL for writing SERQL queries with specific strategies for keywords. For example:
  • QueryBuilder.keywordBuilder().limit(50).offset(10).where(Wheres.prefNote(“water”).or(Wheres.prefNote(“wasser”)).build.execute(thesaurus)
  • One of the reasons I did this was so that I could write queries in my unit tests without copy pasting SERQL queries and so that someone that doesn’t know SERQL should be able to write simple queries easier.1. Changed some search method names in KeywordSearch so they are easier to understand by a non RDF specialist.

Remaining work it to migrate the deprecated methods to the new API.

You can look at the work either as a diff compared to master:

https://github.com/jesseeichar/geonetwork/compare/master…thesaurus

or the raw code at:

https://github.com/jesseeichar/geonetwork/tree/thesaurus

Jesse