Hi John,
Two issues here:
1. the ListSets response from the equella server is invalid (as we found) which means that you can't get the sets & prefixes to search/use in the edit interface when adding an OAI-PMH harvester for an equella server. The fix here is for equella to fix their server - ok, not likely - so we could relax the validation on this request in GeoNetwork but the risk is that we get back stuff we don't understand and further confuse the user by presenting junk in the interface. (A hack here is to run the ListSets request yourself and look at the response eg. http://equella-server/oai/provider?verb=ListSets or just use oai_dc as that set is defined anyway). Alternatively we could let the user enter their own set name and metadata prefix values like joai if we don't get anything (valid) back from the ListSets request in the harvester edit interface (which sounds preferable to me anyway) - anyone else want to comment?
2. I think the invalid xml error you're referring to is how all the metadata records returned from the equella server are flagged when the harvester is actually run. What is happening here is that GeoNetwork is doing a GetRecords request and getting back a request from the equella server that includes the metadata record. This is fine but GeoNetwork attempts to validate the response which fails validation, probably because the equella server (in common with many other oai servers) is not including the schemaLocation attribute for the oai_dc schema on the root element of the embedded metadata record in the GetRecords response. This is perhaps a bit too restrictive on our part as schemaLocation is optional I think on all XML records. Anyway this can be relaxed by leaving validation of the embedded metadata record until after GeoNetwork has made a guess at the schema to which it belongs - at that stage it can use the local schema to validate it and it doesn't need a schemaLocation attribute. This seems to me to be a worthwhile relaxation because it makes lots more records available from the servers I know about without increasing the risk of ingesting junk. Again, does anyone else want to comment on this?
I've tested the second fix in the ANZMEST code and I'll develop the first fix as well to prove the concept (although maybe Mathieu and Julien have done this in the refactoring of the OAI-PMH harvester they have done?).
Cheers,
Simon
________________________________________
From: boabjohn [john@anonymised.com]
Sent: Sunday, 27 March 2011 8:36 PM
To: geonetwork-devel@lists.sourceforge.net
Subject: [GeoNetwork-devel] OAI-PMH ListSets Validation Error: Can be silenced?
(Apologies: cross-posted from GN-Users...perhaps this is the better forum?)
G'Day all,
We're attempting to harvest from a records management platform called
Equella (http://www.equella.com/) which say they can support an OAI-PMH
endpoint.
However, GN throws an "invalid xml" error when we attempt to harvest.
After some laser-like investigation by a friendly wizard (thanks Simon!) it
looks like the problem is with the ListSets response on the Equella server.
It is returning text content in element which apparently is
not valid oaipmh (according to oaipmh XSDs).
Equella say they will fix it "some time soon..."
We need to harvest now.
Can anyone suggest how the validation error can be safely (and succinctly)
silenced so that the harvest can continue?
Thanks in advance,
JB
---
John Brisbin
Managing Director, BoaB interactive
[mb] +61 (0)407 471 565
[ph] +61 (0)7 3103 0574 (voice 2 text)
[im] skype:boabjohn
[www] http://www.boab.info
[po] POB 802, Townsville QLD 4810 AUSTRALIA
--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/OAI-PMH-ListSets-Validation-Error-Can-be-silenced-tp6211988p6211988.html
Sent from the GeoNetwork developer mailing list archive at Nabble.com.
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! Best Open Source Mac Front-Ends 2024
_______________________________________________
GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
geonetwork-devel List Signup and Options
GeoNetwork OpenSource is maintained at GeoNetwork - Geographic Metadata Catalog download | SourceForge.net