Dear all,
the OAI actions which return embedded xml documents have a small but very
serious bug. The bug is that xml namespace declarations in the embedded
xml document are removed if the namespace in question is already declared
by the OAI response.
The consequence of this is that an OAI harvester that extracts the
embedded XML from the OAI response, will produce an invalid xml document.
To see this consider the following example.
A OAI response starts like this:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
you can see that the xsi namespace is declared.
Now consider that an XML document embedded in the OAI response (it is a
GetRecord or ListRecords response) also declares the xsi namespace.
Since GN actually parses the embedded xml into a jdom DOM and attaches it
to the internal DOM that represents the OAI response, the duplicate
namespace declaration is removed.
The behaviour is reproduced by the attached code.
The question to the community is how to avoid the behaviour. I have
tracked the error to Record.java: (toXml()):86 . There, the parsed
metadata (in jdom form) is attached to the OAI record response structure.
What I do not know is where the removal of the duplicate namespace
declaration takes place. It might well be that this is done somewhere else
(in jeeves?) when the XML is marshaled.
In my code example the culprit seems to be the XMLoutputter.
We need to find the part of GN responsable of the duplicate elimination
and then devise a way to tell it not to do so.
It might well be that other parts of GN are affected by this behaviour, too.
best regards
Timo
(attachments)
TestDomNamespace.java (2.56 KB)