[GeoNetwork-devel] OAI-PMH / support for deletions

Dear All,

as mentioned in my other email, I'm planning to work on improving the OAI support in Geonetwork, which is needed to make it scale to bigger catalogues. Before filing a proposal I would like to discuss some issues regarding the deletion of metadata in the database.

One issue is that GN currently does not support "deletions", a mechanism in OAI to signal that a record has been deleted from the catalogue.
http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#DeletedRecords
The consequences of this is that a harvester has to retrieve ALL the records in order to find out what got deleted, something that is unwise for bigger catalogues.
The harvester component in GN also has no support for deletions, which together with the fact that it does not have support for differential harvesting, means that a GN harvester puts enormous strain on a OAI provider it harvests from.

The single biggest problem to the implementation of deletion support is the lack of metadata record history, ergo a metadata record gets deleted from the database when it is deleted in the interface, making it impossible to know what has been deleted.

There are at least two ways around this. Briefly, one would be to have a separate table, just for OAI, in which the uids of deleted records are stored. This would be a not-so-nice but not-so-intrusive solution.

The other one would involve changing GNs database layout, so that records can be deleted in the interface but remain in the DB, as is the case for most systems of that type. Such a solution is more intrusive, one would have to think about primary keys and situations where the same metadata records gets imported and deleted multiple times. On the other hand, a versioning would be potentially beneficial for other components.

Has the issue of how to represent deleted records on the database level been discussed at some point in the history of GN? Which way forward is the best one in your opinion?

best regards
Timo