[GeoNetwork-users] GeoNetwork automatically changes metadata

Hello lists,

when inserting metadata into GeoNetwork (tested with GN 2.4.1) it changes
the metadata. It happens both when doing file insert and when doing
copy/paste insert.

First: in the process of insertion, the method
"setNamespacePrefixUsingSchemas(Element md)" adds prefixes to the default
namespace, essentially removing the default namespace and making it a
qualified, additional namespace. Why is that?

The metadata does not become more or less valid in doing this, and when
processed by e.g. XSLT or any other metadata-reading function, there should
be no difference at all. However if the user later on adds new elements
without prefixing them with a namespace, those new elements will not be in
any namespace (certain to invalidate the XML). So the *behaviour* of the
document is changed quite lots by this namespace-changing operation. Seems
to me that GeoNetwork is making choices here that the user did not ask for !

Second: the insertion process continues and applies the XSLT
"update-fixed-info.xsl". This changes lots of things in the metadata,
including :

- it inserts gmd:fileIdentifier, if it is not present in the metadata
- it overwrites gmd:dateStamp, gmd:metadataStandardName,
gmd:metadataStandardVersion
- it generates values for gml:id attributes
- it sets a hard-coded value to attribute srsName, if it's empty
- it inserts attribute gco:nilReason="missing" for empty gco:CharacterString
elements
- it autocompletes relative locations for codelists to an absolute location
- it does something incomprehensible to gmd:linkage, in some cases
- it does something, incl. uppercasing, to gmd:languageCode

Now, it has been stated by various people on this list recently that
GeoNetwork should not take any action changing metadata (see the discussions
on the editor saving empty optional elements and missing elements from
OGCWXS harvesting). The reason to oppose any action by GeoNetwork towards
more valid metadata being: <quote>[to avoid] the kind of "We know what you
want" type of behavior</quote>.

So if we wish to be consistent in that approach, GN should not mess with the
metadata when it is inserted. Therefore I think we should remove the
functions described above. Any reason why not ?

Kind regards
Heikki Doeleman

comments inline

heikki wrote:

<snip>

First: in the process of insertion, the method "setNamespacePrefixUsingSchemas(Element md)" adds prefixes to the default namespace, essentially removing the default namespace and making it a qualified, additional namespace. Why is that?

<snip>

does it actually take out the xmlns="...." attribute in the MD_Metadata element?

<snip>

- it inserts gmd:fileIdentifier, if it is not present in the metadata

not a good idea -- this identifier must be assigned by the original producer to provide a binding with that metadata by anyone who harvests, otherwise the system will end up with the same metadata with different fileIdentfiers.

- it overwrites gmd:dateStamp, gmd:metadataStandardName, gmd:metadataStandardVersion

dataStamp overwrite makes sense--under the interpretation that it represents the date of most recent update to the metadata in the containing repository. The full logic should be, on harvesting, if the md record is already in the harvesting repository, only update the dateStamp if the metadata content changes as a result of the harvest.
In the case of metadataStandard information, if the harvest is scraping information from OGC WXS getCapabilities (or from any other format different from the format in the repository), it should put in the metadataStandard information for the records its producing; if its havesting from to the same standard format, then the update doesn't matter.

- it generates values for gml:id attributes

these are only scoped within the gml doc that contains them; since they are not intended to be global, this doesn't matter.

- it sets a hard-coded value to attribute srsName, if it's empty

if that value is valid, why is this a problem. I hope that if the srs is not specified in the source metadata, the value set is something like nilReason="missing"

- it inserts attribute gco:nilReason="missing" for empty gco:CharacterString elements

I don't view this as changing the content of the imported metedata--its just making it schema valid.

- it autocompletes relative locations for codelists to an absolute location

If the relative locations don't apply to the path for the import-generated metadata, and the codelists are uniquely identified, this appears to me to be a good thing.

- it does something incomprehensible to gmd:linkage, in some cases
- it does something, incl. uppercasing, to gmd:languageCode

sounds like more research is necessary on what's going on on these two.

So if we wish to be consistent in that approach, GN should not mess with the metadata when it is inserted. Therefore I think we should remove the functions described above. Any reason why not ?

Kind regards
Heikki Doeleman

It looks to me like consideration of each 'adjustment' is in order. Some seem useful and valid to me.

steve