[GeoNetwork-devel] GeoNetwork cannot harvest OAI-PMH service when setSpec/setName contains white spaces

Hello,

I’m trying to use GeoNetwork 2.6.5 to harvest metadata from an OAI-PMH provider (http://www.wis-jma.go.jp/meta/oaiprovider.jsp?verb=ListSets) which is not GeoNetwork-based.
It works well for all sets, excepting for one of them containing white spaces in its name:

National Center for Atmospheric Research National Center for Atmospheric Research

<oai_dc:dc xsi:schemaLocation=“http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd” xmlns:oai_dc=“http://www.openarchives.org/OAI/2.0/oai_dc/” xmlns:dc=“http://purl.org/dc/elements/1.1/” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”>

dc:description
This set contains 10523 records
</dc:description>

</oai_dc:dc>

GeoNetwork raises an error, which is coherent with its statement for Category Management (i.e. “Important! This form allows you to add a Key value to the database. The key can not have spaces.”)

oaiPmhServer 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> dispatching to output for : xml.harvesting.info 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> writing xml for : xml.harvesting.info 2012-11-07 08:19:36,493 DEBUG [jeeves.service] - Service xml is : Exception in endElement: cvc-pattern-valid: Value 'National Center for Atmospheric Research' is not facet-valid with respect to pattern '([A-Za-z0-9\-_\.!~\*'\(\)])+(:[A-Za-z0-9\-_\.!~\*'\(\)]+)*' for type 'setSpecType'. BadXmlResponseEx

OAI-PMH 2.0 says: Each node in a set organization of a repository has:

  • a setSpec – a colon [:] separated list indicating the path from the root of the set hierarchy to the respective node. Each element in the list is a string consisting of any valid URI unreserved characters, which must not contain any colons [:]. Since a setSpec forms a unique identifier for the set within the repository, it must be unique for each set. Flat set organizations have only sets with setSpec that do not contain any colons [:].
    while a white space seems to be an “URI unreserved characters”. One of the examples given by OAI itself for setName is “Quantum Psychology”.

The question is: Does the constraint “no white spaces” for Category Names belongs to GeoNetwork only, or well it is an OAI-PMH constraint? How could GeoNetwork cope with this OAI-MPH server whithout changing the set/category name?
I guess that if I would replace each space by something else should fix this issue, but should also change the original category name…

Many thanks,
Victor

···


Victor Sinceac: victor.sinceac@anonymised.com, vsinceac@anonymised.com
Address: 24, villa Auguste Blanqui, 75013 Paris, France
Phone: +33 9 5277 0042 / +33 6 9507 0434

Dear GeoNetwork’s developers,

Just a quick and urgent question to you.

Is the GN map viewer developed based on GeoExplorer or OpenLayers?

Thank you,

Patrizia

ERRATA:
I said “it works well for all sets”, but this is wrong…
The error “Cannot query OAI-PMH server” happens before the harvesting, when trying to Retrieve Info.
Thus, it is not possible to harvest this center just because it has one setSpec containing white space…

···

On 2012-11-07 10:06, Victor Sinceac wrote:

Hello,

I’m trying to use GeoNetwork 2.6.5 to harvest metadata from an OAI-PMH provider (http://www.wis-jma.go.jp/meta/oaiprovider.jsp?verb=ListSets) which is not GeoNetwork-based.
It works well for all sets, excepting for one of them containing white spaces in its name:

National Center for Atmospheric Research National Center for Atmospheric Research

<oai_dc:dc xsi:schemaLocation=“http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd” xmlns:oai_dc=“http://www.openarchives.org/OAI/2.0/oai_dc/” xmlns:dc=“http://purl.org/dc/elements/1.1/” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”>

dc:description
This set contains 10523 records
</dc:description>

</oai_dc:dc>

GeoNetwork raises an error, which is coherent with its statement for Category Management (i.e. “Important! This form allows you to add a Key value to the database. The key can not have spaces.”)

oaiPmhServer 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> dispatching to output for : xml.harvesting.info 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> writing xml for : xml.harvesting.info 2012-11-07 08:19:36,493 DEBUG [jeeves.service] - Service xml is : Exception in endElement: cvc-pattern-valid: Value 'National Center for Atmospheric Research' is not facet-valid with respect to pattern '([A-Za-z0-9\-_\.!~\*'\(\)])+(:[A-Za-z0-9\-_\.!~\*'\(\)]+)*' for type 'setSpecType'. BadXmlResponseEx

OAI-PMH 2.0 says: Each node in a set organization of a repository has:

  • a setSpec – a colon [:] separated list indicating the path from the root of the set hierarchy to the respective node. Each element in the list is a string consisting of any valid URI unreserved characters, which must not contain any colons [:]. Since a setSpec forms a unique identifier for the set within the repository, it must be unique for each set. Flat set organizations have only sets with setSpec that do not contain any colons [:].
    while a white space seems to be an “URI unreserved characters”. One of the examples given by OAI itself for setName is “Quantum Psychology”.

The question is: Does the constraint “no white spaces” for Category Names belongs to GeoNetwork only, or well it is an OAI-PMH constraint? How could GeoNetwork cope with this OAI-MPH server whithout changing the set/category name?
I guess that if I would replace each space by something else should fix this issue, but should also change the original category name…

Many thanks,
Victor


Victor Sinceac: victor.sinceac@anonymised.com, vsinceac@anonymised.com
Address: 24, villa Auguste Blanqui, 75013 Paris, France
Phone: +33 9 5277 0042 / +33 6 9507 0434


Victor Sinceac: victor.sinceac@anonymised.com, vsinceac@anonymised.com
Address: 24, villa Auguste Blanqui, 75013 Paris, France
Phone: +33 9 5277 0042 / +33 6 9507 0434

Hi Patrizia

The map viewer was build based on GeoExt and OpenLayers, it’s previous to GeoExplorer development.

For future releases would be great to integrate GeoExplorer so can benefit from improvements in this project. But I think no project funding this development for now.

Regards,
Jose García

On Wed, Nov 7, 2012 at 11:39 AM, Monteduro, Patrizia (NRL) <Patrizia.Monteduro@anonymised.com> wrote:

Dear GeoNetwork’s developers,

Just a quick and urgent question to you.

Is the GN map viewer developed based on GeoExplorer or OpenLayers?

Thank you,

Patrizia


LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Sorry to come back again before having any answer, but I think the OAI-PMH standard does contain wrong samples, which leads to a blocking issue for GeoNetwork harvesting.

In the standard (http://www.openarchives.org/OAI/openarchivesprotocol.html#Set) they give as example for setName strings containing spaces… while the definition is:
“string consisting of any valid URI unreserved characters
and the definition of Unreserved Characters is
unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”
It appears the Tokyo OAI provider (which is a big one) doesn’t follow the definition but the wrong examples given by OAI-PMH…

Did someone already have the same issue with GeoNetwork ?

Thanks again,
Victor

···

On 2012-11-07 10:48, Victor Sinceac wrote:

ERRATA:
I said “it works well for all sets”, but this is wrong…
The error “Cannot query OAI-PMH server” happens before the harvesting, when trying to Retrieve Info.
Thus, it is not possible to harvest this center just because it has one setSpec containing white space…


Victor Sinceac: victor.sinceac@anonymised.com, vsinceac@anonymised.com
Address: 24, villa Auguste Blanqui, 75013 Paris, France
Phone: +33 9 5277 0042 / +33 6 9507 0434

On 2012-11-07 10:06, Victor Sinceac wrote:

Hello,

I’m trying to use GeoNetwork 2.6.5 to harvest metadata from an OAI-PMH provider (http://www.wis-jma.go.jp/meta/oaiprovider.jsp?verb=ListSets) which is not GeoNetwork-based.
It works well for all sets, excepting for one of them containing white spaces in its name:

National Center for Atmospheric Research National Center for Atmospheric Research

<oai_dc:dc xsi:schemaLocation=“http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd” xmlns:oai_dc=“http://www.openarchives.org/OAI/2.0/oai_dc/” xmlns:dc=“http://purl.org/dc/elements/1.1/” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”>

dc:description
This set contains 10523 records
</dc:description>

</oai_dc:dc>

GeoNetwork raises an error, which is coherent with its statement for Category Management (i.e. “Important! This form allows you to add a Key value to the database. The key can not have spaces.”)

oaiPmhServer 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> dispatching to output for : xml.harvesting.info 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> writing xml for : xml.harvesting.info 2012-11-07 08:19:36,493 DEBUG [jeeves.service] - Service xml is : Exception in endElement: cvc-pattern-valid: Value 'National Center for Atmospheric Research' is not facet-valid with respect to pattern '([A-Za-z0-9\-_\.!~\*'\(\)])+(:[A-Za-z0-9\-_\.!~\*'\(\)]+)*' for type 'setSpecType'. BadXmlResponseEx

OAI-PMH 2.0 says: Each node in a set organization of a repository has:

  • a setSpec – a colon [:] separated list indicating the path from the root of the set hierarchy to the respective node. Each element in the list is a string consisting of any valid URI unreserved characters, which must not contain any colons [:]. Since a setSpec forms a unique identifier for the set within the repository, it must be unique for each set. Flat set organizations have only sets with setSpec that do not contain any colons [:].
    while a white space seems to be an “URI unreserved characters”. One of the examples given by OAI itself for setName is “Quantum Psychology”.

The question is: Does the constraint “no white spaces” for Category Names belongs to GeoNetwork only, or well it is an OAI-PMH constraint? How could GeoNetwork cope with this OAI-MPH server whithout changing the set/category name?
I guess that if I would replace each space by something else should fix this issue, but should also change the original category name…

Many thanks,
Victor


Victor Sinceac: victor.sinceac@anonymised.com, vsinceac@anonymised.com
Address: 24, villa Auguste Blanqui, 75013 Paris, France
Phone: +33 9 5277 0042 / +33 6 9507 0434


Victor Sinceac: victor.sinceac@anonymised.com, vsinceac@anonymised.com
Address: 24, villa Auguste Blanqui, 75013 Paris, France
Phone: +33 9 5277 0042 / +33 6 9507 0434

I finally found the answer by myself. I’ll add it here, it may be useful to other people:

Victor

···

On 2012-11-07 11:49, Victor Sinceac wrote:

Sorry to come back again before having any answer, but I think the OAI-PMH standard does contain wrong samples, which leads to a blocking issue for GeoNetwork harvesting.

In the standard (http://www.openarchives.org/OAI/openarchivesprotocol.html#Set) they give as example for setName strings containing spaces… while the definition is:
“string consisting of any valid URI unreserved characters
and the definition of Unreserved Characters is
unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”
It appears the Tokyo OAI provider (which is a big one) doesn’t follow the definition but the wrong examples given by OAI-PMH…

Did someone already have the same issue with GeoNetwork ?

Thanks again,
Victor

On 2012-11-07 10:48, Victor Sinceac wrote:

ERRATA:
I said “it works well for all sets”, but this is wrong…
The error “Cannot query OAI-PMH server” happens before the harvesting, when trying to Retrieve Info.
Thus, it is not possible to harvest this center just because it has one setSpec containing white space…

On 2012-11-07 10:06, Victor Sinceac wrote:

Hello,

I’m trying to use GeoNetwork 2.6.5 to harvest metadata from an OAI-PMH provider (http://www.wis-jma.go.jp/meta/oaiprovider.jsp?verb=ListSets) which is not GeoNetwork-based.
It works well for all sets, excepting for one of them containing white spaces in its name:

National Center for Atmospheric Research National Center for Atmospheric Research

<oai_dc:dc xsi:schemaLocation=“http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd” xmlns:oai_dc=“http://www.openarchives.org/OAI/2.0/oai_dc/” xmlns:dc=“http://purl.org/dc/elements/1.1/” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”>

dc:description
This set contains 10523 records
</dc:description>

</oai_dc:dc>

GeoNetwork raises an error, which is coherent with its statement for Category Management (i.e. “Important! This form allows you to add a Key value to the database. The key can not have spaces.”)

oaiPmhServer 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> dispatching to output for : xml.harvesting.info 2012-11-07 08:19:36,493 INFO [jeeves.service] - -> writing xml for : xml.harvesting.info 2012-11-07 08:19:36,493 DEBUG [jeeves.service] - Service xml is : Exception in endElement: cvc-pattern-valid: Value 'National Center for Atmospheric Research' is not facet-valid with respect to pattern '([A-Za-z0-9\-_\.!~\*'\(\)])+(:[A-Za-z0-9\-_\.!~\*'\(\)]+)*' for type 'setSpecType'. BadXmlResponseEx

OAI-PMH 2.0 says: Each node in a set organization of a repository has:

  • a setSpec – a colon [:] separated list indicating the path from the root of the set hierarchy to the respective node. Each element in the list is a string consisting of any valid URI unreserved characters, which must not contain any colons [:]. Since a setSpec forms a unique identifier for the set within the repository, it must be unique for each set. Flat set organizations have only sets with setSpec that do not contain any colons [:].
    while a white space seems to be an “URI unreserved characters”. One of the examples given by OAI itself for setName is “Quantum Psychology”.

The question is: Does the constraint “no white spaces” for Category Names belongs to GeoNetwork only, or well it is an OAI-PMH constraint? How could GeoNetwork cope with this OAI-MPH server whithout changing the set/category name?
I guess that if I would replace each space by something else should fix this issue, but should also change the original category name…

Many thanks,
Victor