[GeoNetwork-users] Harvesting Problem XML Parser - a bug?

Hello list,

I have a problem with harvesting a csw catalogue and I think it's a
problem with the xml-parser in the csw harvest engine. When i try to
harvest a csw node I get an error message in the logfile:

2011-10-19 11:02:55,616 WARN [geonetwork.harvester] - Error parsing
metadata: org.jdom.JDOMException: XPath error while evaluating
"gmd:fileIdentifier/gco:CharacterString": XPath expression uses unbound
namespace prefix gco: XPath expression uses unbound namespace prefix gco

In the metadatasets the namespace for gco
(xmlns:gco="http://www.isotc211.org/2005/gco) is declared in
<gmd:fileIdentifier> and not in <gmd:MD_Metadata>, but I think it's a
valid xml.

When I try to harvest this metadataset with local filesystem harvesting
I get this metadataset in my catalogue. So I put the namespace
(xmlns:gco="http://www.isotc211.org/2005/gco) from <gmd:fileIdentifier>
to <gmd:MD_Metadata> and harvest the new xml over another catalogue via
csw. This works.

test_nicht_valide.xml: no harvesting over csw possible, over local
filesystem harvesting it works
test_nicht_valide_korrigiert_1.xml: it's possible to harvest this
metadataset over csw

Thanks for help

Regards
Martin

--
********************************************
Where2B Konferenz 2011
01. Dezember 2011 in Bonn
www.where2b-conference.com
********************************************

WhereGroup GmbH & Co. KG
Eifelstraße 7
53119 Bonn
Germany

Fon: +49 (0)228 / 90 90 38 - 24
Fax: +49 (0)228 / 90 90 38 - 11

martin.hueben@anonymised.com
www.wheregroup.com
Amtsgericht Bonn, HRA 6788
--------------------------------------------------
Komplementärin:
WhereGroup Verwaltungs GmbH
vertreten durch:
Olaf Knopp, Peter Stamm
--------------------------------------------------

(attachments)

test_nicht_valide.xml (21.9 KB)
test_nicht_valide_korrigiert_1.xml (19.6 KB)

Hello,

the gco: identified (and in general, namespace identifiers) must
always be complete and properly formatted (aka not void, nor starting
with numbers or underscores). Failure to meet those conditions results
in errors similar to what you saw, not only in the case of metadata,
but in all other OGC WxS services as well. It can be really hard to
see what's causing an XPath error unless one remembers to check the
namespace or the names of database tables/columns.

I'm glad you were able to sort this out yourself quickly, though :slight_smile:

Kind regards,

Victor Epitropou

2011/10/19 Martin Hüben <martin.hueben@anonymised.com>:

Hello list,

I have a problem with harvesting a csw catalogue and I think it's a
problem with the xml-parser in the csw harvest engine. When i try to
harvest a csw node I get an error message in the logfile:

2011-10-19 11:02:55,616 WARN [geonetwork.harvester] - Error parsing
metadata: org.jdom.JDOMException: XPath error while evaluating
"gmd:fileIdentifier/gco:CharacterString": XPath expression uses unbound
namespace prefix gco: XPath expression uses unbound namespace prefix gco

In the metadatasets the namespace for gco
(xmlns:gco="http://www.isotc211.org/2005/gco) is declared in
<gmd:fileIdentifier> and not in <gmd:MD_Metadata>, but I think it's a
valid xml.

When I try to harvest this metadataset with local filesystem harvesting
I get this metadataset in my catalogue. So I put the namespace
(xmlns:gco="http://www.isotc211.org/2005/gco) from <gmd:fileIdentifier>
to <gmd:MD_Metadata> and harvest the new xml over another catalogue via
csw. This works.

test_nicht_valide.xml: no harvesting over csw possible, over local
filesystem harvesting it works
test_nicht_valide_korrigiert_1.xml: it's possible to harvest this
metadataset over csw

Thanks for help

Regards
Martin

--
********************************************
Where2B Konferenz 2011
01. Dezember 2011 in Bonn
www.where2b-conference.com
********************************************

WhereGroup GmbH & Co. KG
Eifelstraße 7
53119 Bonn
Germany

Fon: +49 (0)228 / 90 90 38 - 24
Fax: +49 (0)228 / 90 90 38 - 11

martin.hueben@anonymised.com
www.wheregroup.com
Amtsgericht Bonn, HRA 6788
--------------------------------------------------
Komplementärin:
WhereGroup Verwaltungs GmbH
vertreten durch:
Olaf Knopp, Peter Stamm
--------------------------------------------------

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Martin,

You are right. CSW harvest doesn't work for this metadata because: in the metadatasets the namespace for gco
(xmlns:gco="http://www.isotc211.org/2005/gco) is declared in
<gmd:fileIdentifier> and not in <gmd:MD_Metadata>

There are some differences for CSW harvest and Local Harvest as below:
The process of GeoNetwork CSW harvesting is: 1) getRecords 2)parse the getRecords response, get the fileIdentifier information 3) getRecordById based on the fileIdentifier

So there is a important step in CSW harvest: Using gmd:fileIdentifier/gco:CharacterString to get the fileIdentifier with XPath. Becuase fileIdentifier information is needed for GetRecordById

But in the local harvest, We don't need the fileIdentifier information. So GeoNetwork don't parse the metadata to get the fileIdentifier information and it works.

Hope it will be helpful for you.

Kai
Joint Center for Intelligent Spatial Computing
703-395-2337

----- Original Message -----
From: Martin Hüben <martin.hueben@anonymised.com>
Date: Wednesday, October 19, 2011 9:30 am
Subject: [GeoNetwork-users] Harvesting Problem XML Parser - a bug?

Hello list,

I have a problem with harvesting a csw catalogue and I think it's a
problem with the xml-parser in the csw harvest engine. When i try to
harvest a csw node I get an error message in the logfile:

2011-10-19 11:02:55,616 WARN [geonetwork.harvester] - Error parsing
metadata: org.jdom.JDOMException: XPath error while evaluating
"gmd:fileIdentifier/gco:CharacterString": XPath expression uses
unboundnamespace prefix gco: XPath expression uses unbound
namespace prefix gco

In the metadatasets the namespace for gco
(xmlns:gco="http://www.isotc211.org/2005/gco) is declared in
<gmd:fileIdentifier> and not in <gmd:MD_Metadata>, but I think
it's a
valid xml.

When I try to harvest this metadataset with local filesystem
harvestingI get this metadataset in my catalogue. So I put the
namespace(xmlns:gco="http://www.isotc211.org/2005/gco) from
<gmd:fileIdentifier>to <gmd:MD_Metadata> and harvest the new xml
over another catalogue via
csw. This works.

test_nicht_valide.xml: no harvesting over csw possible, over local
filesystem harvesting it works
test_nicht_valide_korrigiert_1.xml: it's possible to harvest this
metadataset over csw

Thanks for help

Regards
Martin

--
********************************************
Where2B Konferenz 2011
01. Dezember 2011 in Bonn
www.where2b-conference.com
********************************************

WhereGroup GmbH & Co. KG
Eifelstraße 7
53119 Bonn
Germany

Fon: +49 (0)228 / 90 90 38 - 24
Fax: +49 (0)228 / 90 90 38 - 11

martin.hueben@anonymised.com
www.wheregroup.com
Amtsgericht Bonn, HRA 6788
--------------------------------------------------
Komplementärin:
WhereGroup Verwaltungs GmbH
vertreten durch:
Olaf Knopp, Peter Stamm
--------------------------------------------------