Hello GN-developers!
my name is Timo Proescholdt, I work at the World Meteorological
Organization in the WMO information system office (together with
David Thomas and Eliot Christian).
We have been looking into Z39.50 and SRU support for GN and are
currently implementing Z39.50 and SRU support with the new Jzkit2
library. I want to share the code in the next days, but wanted to
ask for clarification on another point first.
We have also looked into date-search support for the classical Z39.50.
For this I changed z3959Server.xml to support date-queries, mapping them
to Lucene RangeQueries. (no java code change required)
In the process I ran across several puzzles.
First. Presuming we have the following two records in the index and DB,
input via the import function.
id | _changedate
1 | 2009-11-26T16:23:22
2 | 2009-11-27T16:23:22
if we search using the GN webfrontend for records between
2009-11-26T00:00:00 and 2009-11-29T00:00:00 you would
expect 1 and 2 to come up. However, only 2 is selected.
In order to get both you have to go back to 2009-11-25T00:00:00.
This is because the datestring is indexed with the capital 'T'
but LuceneSearcher.java [line: 472] changes the query to lowercase.
If I issue the query in uppercase (e.G with Luke) the I get the
correct(?) result.
in lexicographical order lucene oddly seems to order.
2009-11-25t00:00:00.
2009-11-26T16:23:22
2009-11-26t00:00:00
I dont know if this is the expected behaviour, since it is not clear
if the first date is included, but the semantics 00:00:00 leads that
way. I want to point out that this has nothing to do with the changes
we made but is the behaviour of a vanilla system.
I have also noticed that the lucene index contains several almost
identical indices and I was wondering if somebody could enlighten me
as to their intended usage. There is _changedate,_createdate and
changedate (without the underscore) (and possibly many more dates).
Am I right that the indices prefixed with an underscore are identical
to the database fields, whereas everything else comes from the metadata
itself (e.G extracted by index-fields.xsl)? If this was the case, what
sense would that make, since the DB changedate,createdate are filled
in the DB when the metadata is imported by extracting these very dates
from the metadata (Importer.java line: 152)?
A possible interpretation would be that the _ prefixed ones are allowed
to change whereas everything else is bound to be equal to the actual
metadata?
Code reading cannot help me here so I would very much appreciate
clarification on how the dates are supposed to be used.
many thanks and best regards
Timo
--
Timo Pröscholdt
Program Officer, WMO Information System (WIS)
Observing and Information Systems Department
World Meteorological Organization
Tel: +41 22 730 81 76
Cell: +41 77 40 63 554
e-mail: tproescholdt@anonymised.com