Hi Timo,
I agree with Doug's comments. I would also like to add that the Z39.50 SRU is very timely. We have been considering this is necessary for people to be able to call the GN Z39.50 searches from other applications.
I manage the Australian Spatial Data Gateway (ASDD) which is a distributed search system of metadata about datasets in Australia. I process the search statistics of this gateway (See http://asdd.ga.gov.au/asdd/tech/gatewaystats/). In general, these statistics show that most searches of the ASDD are of multiple fields or search terms, even in the basic search interface.
The ASDD technology is very robust but old and we have considered replacing it with GN. However, GN needs Z39.50 changes to replace the ASDD and what you are doing is very appropriate for this work.
I would strongly suggest that every use, relation, structure and truncation attribute of the Z39.50 GEO profile should be indexed in GN with Lucene. This would allow any distributed search of any GN system would return reliable context rather than each instance of GN creating their own Lucene indexing elements. If the latter occurred then a distributed search across multiple GN servers may return unreliable results because one doesn't know what has been indexed on those different GN servers.
For example, If the dateStamp is Lucene indexed on all GN server using the date structure, right truncation and the different date type relation Z39.50 GEO attributes then a distributed dateStamp search of multiple GN servers would return valid results. If one GN server indexed the dateStamp using string data type then the search results from that GN server may not be suitable.
Of course the search interface will determine what elements, relations etc. a user can search on. IE. If the manager of a GN system decides that the dateStamp should not be searchable then the users won't be able to search that element. However, if at a later date the manager decides that the dateStamp should be allowed to be searched by the users' interface, then he or she can add it and know that a distributed search on dateStamp will return reliable results because all GN servers have been indexed using a common standard.
Regarding the searching of local and distributed nodes. It is highly likely that a user will like to find resources no matter where they are. Hence, the search should search all local and remote zservers at the one time. However, if a user knows that a particular repository has the data that they want then the user should be allowed to select that one repository (whether it is local or remote) from a picklist so that the search returns results that the user is interested in.
A simple search should be very similar to a Google search. IE it should search all local and remote repositories using free text search anywhere in the metadata document. A simple search will hence cater for people who don't know much about the metadata standard nor the available repositories.
An advanced search should allow the power users who are more familiar with the metadata standards and/or the available repositories to choose what repositories they wish to search and/or what fields they want to limit their search on.
An obvious advanced search field is 'hierarchyLevel'. Once resource managers realise that they can use ISO 19115 to manage their service, documents, projects, hardware, collections, series, software, models, etc. then users will want to search for these particular types of resources. For example, search for all 'software' that processes 'mif' files, search for all satellite 'hardware', search for all rock 'collections', etc. This will allow a user to minimise the list of results to the type of information they are interested in.
I would also suggest that the advanced search should allow:
- word list search on topicCategory
- word list search on geographic identifier. Eg. ANZLIC Geographic Extent Names (See http://asdd.ga.gov.au/asdd/profileinfo/anzlic-allgens.xml for list and click on the "select from GEN Lists" button at URL http://asdd.ga.gov.au/asdd/tech/zap/advanced.html to see implementation).
- phrase search with right truncation on keywords within multiple thesaurusNames. Eg. NSW from Jurisdiction thesauri and AGRICULTURE* on ANZLIC search word list. This latter example would return results where the ANZLIC search word is 'AGRICULTURE', 'AGRICULTURE-Crops', 'AGRICULTURE-Horticulture', 'AGRICULTURE-Irrigation' or 'AGRICULTURE-Livestock'. (Select "ANZLIC search word" pick list at URL; http://asdd.ga.gov.au/asdd/tech/zap/advanced.html)
- spatial searches (overlaps [default], fully enclosed within, encloses, fully outside of and near) for geographic bounding box and geographic polygon elements. The interface could allow a user to choose the extent by: entering coordinates in a bounding box, drawing a bounding box or polygon on a map or selecting geographicIdentifiers from a pick list.
- date searches (before, before or during, after, after or during and during) on dateStamp, identification//citation//date, temporalExtent//extent and spatialTemporalExtent//extent.
- word list for the 'status' element. IE. People may want to find all 'planned' resources or 'historicalarchive' resources.
- word list for the 'spatialRepresentationType'.
- phrase search on the spatialResolution//equivalentScale or spatialResolution//distance.
- etc. etc.
I hope that this helps.
John
-----Original Message-----
From: Doug Nebert [mailto:ddnebert@anonymised.com]
Sent: Thursday, 17 December 2009 7:21 AM
To: Timo Proescholdt
Cc: geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] Z39.50 remote search
Timo Proescholdt wrote:
> Hi GN developers,
>
> as you know I have been working on the Z39.50 / SRU stuff
in the last
> weeks, and I'm proud to announce that I implemented a new Z39.50
> interface and SRU support via the Jzkit3 library. As soon
as I managed
> to disentangle my eclipse workspace I will post a diff here for
> discussion, for there are a lot of things that could probably only
> be decided by the community or more experienced geonetworkers.
>
> The new Z39.50 interface happens to break the Z39.50 remote search
> (because it is implemented using the library) and since the
new Jzkit
> library has no backward-compatibility (neither for interfaces nor
> for configuration files) this feature would have to be
re-implemented,
> too. (which is not a bad idea anyway given the shape of
that part of the
> code).
>
> I looked at bit into how it is currently implemented and
how it would be
> done with the new library but I felt that this should maybe
be discussed
> before.
>
> Does anybody have a clear set of features the Z39.50 client search
> should support and how it should be configured?
>
> Currently there is the file "repositories.xml", containing a list of
> repositories that are searched when a Z39.50 remote search
is triggered.
> The expected behaviour is not totally clear to me. First, I dont see
> where the remote search can be enabled (I see it in the code of the
> portal-search, but not in the actual webinteface). Second, I'm not
> 100% sure if all the repositories contained therein should
be searched,
> when a remote search is triggered, or if the user can
choose one or more
> repositories.
>
> Whatever is contained in the old repositories.xml could be
automatically
> transformed (even on the fly by XSLT) into the new
configuration format,
> and it should also be possible to reproduce the old behaviour of the
> remote search (if somebody told we what it actually was). (see
> attachments for demos of the old and the new config file)
> But this seems like a good point in time for me to discuss a bit.
>
> The jzkit3 library supports a search Service with caching and
> persistence which is a possibility for dramatic speed improvements.
> It should also be possible to expose remote Z39.50 targets
as local ones
> and (in theory) even to mix results from remote and local sources.
> It all depends on what people want.
>
> So if anybody has a good understanding of how the Z39.50
remote search
> is currently used and what features users demand I would be
grateful if
> he or she could share it, so that I have a look at it while
unsnarling
> the patch.
>
I'll let Archie comment on how he thinks the remote search works or
worked, but I would promote consistency in the use of Z39.50 and the
other protocols - both in the context of distributed search
and harvest.
For any server target, the GN administrator should be able to
identify a
predictable set of parameters to connect to a catalog whether
it be CSW,
Z39.50 or other source. Popular ElementSet="full" responses in Z39.50
"GEO" world are as XML or HTML in either FGDC or ISO
19115/19139 format.
Parsing "brief" record formats would be more similar but may require
transformation of structured text into XML to feed to GN XSLT if XML
brief is not supported. In the Earth Observation community
the gils and
geo element set comprise the queryable terms; support for mapping the
attribute by number to name or XML element name would be logical.
Prior configuration of Z39.50 targets is not consistent with the more
modular administrative support that allows selection and
parameterization of catalog services and instances for either
harvest or
distributed search. It would be better to refactor the configuration
using CSW or OAI-PMH as a model and the harvest admin
parameters mostly
apply to a set of distributed search parameters.
For any given catalog known to GeoNetwork, the following minimum items
should be collected and managed by the admin interface:
protocol/profile
hostname
(port)
(database)
(user)
(password)
(harvest-freq)
(all-recs-query)
(icon-file)
brief-query-transform-path
brief-response-transform-path
full-query-transform-path
full-response-transform-path
I think with this set, linking to a set of transform (xslt) files one
could configure existing and additional protocols and service
instances
in a straightforward way and use them in either harvest or
direct query.
> best regards
> Timo
--
Douglas D. Nebert
Senior Advisor for Geospatial Technology, System-of-Systems Architect
FGDC Secretariat T:703 648 4151 F:703 648-5755 C:703 459-5860
--------------------------------------------------------------
----------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app
distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork