Hi Jeroen,
Thanks for confirming that this is a bug. We're not in a position to develop a fix immediately, and we've developed a work around within our own system for now. We may able to apply a proper fix and submit a patch sometime down the track, although this is uncertain so I've created a ticket as suggested.
Regards,
Aaron Sedgmen
GeoScience Australia
-----Original Message-----
From: Jeroen Ticheler [mailto:Jeroen.Ticheler@anonymised.com]
Sent: Tuesday, 10 February 2009 9:55
To: Sedgmen Aaron
Cc: Hockaday John; geonetwork-devel@lists.sourceforge.net
Subject: Re: [GeoNetwork-devel] File Identifiers being replaced during harvest [SEC=UNCLASSIFIED]
Hi Aaron,
I see I never responded to this email. Sorry about that.
I fully agree with your observations and note that there is some
inconsistent behavior in GeoNetwork in this respect. The MEF based
import GeoNetwork also has actually keeps the original UUID. I now
understand from you that the webdav does not consider the UUID. In
fact that is something that would better be fixed to support your
workflow.
You now have two options:
1- put a ticket in the trac.osgeo.org/geonetwork and wait until
someone fixes it
2- modify the behavior yourself (or hire someone to do it) and submit
a patch with the fix.
Hope this helps,
ciao,
Jeroen
On Jan 22, 2009, at 2:01 AM, <Aaron.Sedgmen@anonymised.com> <Aaron.Sedgmen@anonymised.com5...
> wrote:
Hi Jeroen,
Thanks for taking time to look at this. I'll try to clarify our
situation and the issue we're having a little better.
We have a metadata system separate to our GeoNetwork (GN)
installation in which we are generating ISO 19139 metadata records
(ANZLIC Metadata Profile). We periodically harvest metadata from
this system into our GN repository, resulting in two copies of the
same metadata records (or so we hope). The non-GN metadata system
is the authoritative source - this is where the metadata are
created, updated and deleted. GN must synchronise its metadata
content in the harvest process, synchronising any inserted, updated
or deleted records that may have occurred since the previous harvest.
Our observation is that during the initial harvest of a record, GN
will ignore the existing FileIdentifier and replace it with its own
UUID. This is effectively creating a logically new metadata record,
and we now have two unique metadata records describing the same
resource. It is important that the FileIdentifier is not altered,
such that the harvest results in a copy of the same metadata record,
i.e. the two different metadata systems contain instances of the
same metadata record.
It becomes more problematic when synchronising an updated record.
GN will blindly replace a previously harvested record in its
repository if the time stamp has changed on the file being harvested
(we're using WebDav). GN does not check if the FileIdentifier in
the updated record being harvested matches the FileIdentifier in the
corresponding record in the repository. This effectively results in
GN now containing the updated record with a FileIdentifier matching
the authoritative source, which is what we wanted in the first
place, although it upsets GN's internal record keeping as it tracks
the FileIdentifier in a separate column in the database, populated
during initial harvest.
I can see why the FileIdentifier is being replaced by GN during
harvest, to ensure uniqueness of the metadata in the GN repository -
this could be a real issue when harvesting from multiple nodes. The
question here is whether the benefits of ensuring uniqueness in the
GN repository outweigh the implications of altering harvested
metadata from the authoritative source.
I hope this makes things clearer.
Regards,
Aaron Sedgmen
Geoscience Australia