[GeoNetwork-devel] Retrieving information on records that don't validate when harvested

Hi List

(reposting on dev list, as it’s probably a better fit that the users list)

I am harvesting from one geonetwork node to another (both 2.10) using xml services to trigger the harvest- this is done with a small Flask application on the target server, because access to the full geonetwork installation on the target server is restricted. I would like to be able to report back (eg get a UUID) on the records that aren’t harvested successfully, usually because they don’t validate.

The only option I have thought of so far is to relax the harvesting rules so that all records are imported, whether they are valid or not, and then iterating through them to check for validity. However from my understanding of the documentation on validation using the XML services (http://geonetwork-opensource.org/manuals/2.10.4/eng/developer/xml_services/metadata_xml_validation.html) the service needs the xml record passing as the data in a post request and this seems time-consuming if all (300) records need checking in this way every time a harvest is run.

Is that correct, or can I use the service differently?

Alternatively can anyone think of a different approach to the problem?

Thanks

Jo

···

Jo Cook
Astun Technology Ltd, The Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 7930 524 155

iShare - Data integration and publishing platform


Company registration no. 5410695. Registered in England and Wales. Registered office: 120 Manor Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

Hi Jo

There’s a metadata.validate service, but afaik works only if you’re in an edit session of a metadata ( otherwise fails) and doesn’t require to send the xml. The service validates the metadata record and stores the validation information in the database, that later when you use for example the q service to search for the metadata, you get the validation information in the geonet:info section of each result.

I guess this service can be adapted so works with metadata without requiring the metadata to be in an edit session. An improvement could be that the service would work with a metadata selection, so you could first select all the metadata harvested from a server and execute the process in the selection. This way 1 call should process all the records.

But unless I’m wrong such a service requires to be implemented based in the actual metadata.validate service.

Regards,
Jose García

···

On Fri, Jul 3, 2015 at 2:52 PM, Jo Cook <jocook@anonymised.com> wrote:

Hi List

(reposting on dev list, as it’s probably a better fit that the users list)

I am harvesting from one geonetwork node to another (both 2.10) using xml services to trigger the harvest- this is done with a small Flask application on the target server, because access to the full geonetwork installation on the target server is restricted. I would like to be able to report back (eg get a UUID) on the records that aren’t harvested successfully, usually because they don’t validate.

The only option I have thought of so far is to relax the harvesting rules so that all records are imported, whether they are valid or not, and then iterating through them to check for validity. However from my understanding of the documentation on validation using the XML services (http://geonetwork-opensource.org/manuals/2.10.4/eng/developer/xml_services/metadata_xml_validation.html) the service needs the xml record passing as the data in a post request and this seems time-consuming if all (300) records need checking in this way every time a harvest is run.

Is that correct, or can I use the service differently?

Alternatively can anyone think of a different approach to the problem?

Thanks

Jo

Jo Cook
Astun Technology Ltd, The Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 7930 524 155

iShare - Data integration and publishing platform


Company registration no. 5410695. Registered in England and Wales. Registered office: 120 Manor Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.


Don’t Limit Your Business. Reach for the Cloud.
GigeNET’s Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Hi Jose,

Thanks for the response. So it sounds like we’d need to do some coding to come up with a better solution.

Alternatively I might have a look at the validation information in the database to see if we can do anything with that.

Thanks again

Jo

···

On Fri, Jul 3, 2015 at 2:22 PM, Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi Jo

There’s a metadata.validate service, but afaik works only if you’re in an edit session of a metadata ( otherwise fails) and doesn’t require to send the xml. The service validates the metadata record and stores the validation information in the database, that later when you use for example the q service to search for the metadata, you get the validation information in the geonet:info section of each result.

I guess this service can be adapted so works with metadata without requiring the metadata to be in an edit session. An improvement could be that the service would work with a metadata selection, so you could first select all the metadata harvested from a server and execute the process in the selection. This way 1 call should process all the records.

But unless I’m wrong such a service requires to be implemented based in the actual metadata.validate service.

Regards,
Jose García

On Fri, Jul 3, 2015 at 2:52 PM, Jo Cook <jocook@anonymised.com> wrote:

Hi List

(reposting on dev list, as it’s probably a better fit that the users list)

I am harvesting from one geonetwork node to another (both 2.10) using xml services to trigger the harvest- this is done with a small Flask application on the target server, because access to the full geonetwork installation on the target server is restricted. I would like to be able to report back (eg get a UUID) on the records that aren’t harvested successfully, usually because they don’t validate.

The only option I have thought of so far is to relax the harvesting rules so that all records are imported, whether they are valid or not, and then iterating through them to check for validity. However from my understanding of the documentation on validation using the XML services (http://geonetwork-opensource.org/manuals/2.10.4/eng/developer/xml_services/metadata_xml_validation.html) the service needs the xml record passing as the data in a post request and this seems time-consuming if all (300) records need checking in this way every time a harvest is run.

Is that correct, or can I use the service differently?

Alternatively can anyone think of a different approach to the problem?

Thanks

Jo

Jo Cook
Astun Technology Ltd, The Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 7930 524 155

iShare - Data integration and publishing platform


Company registration no. 5410695. Registered in England and Wales. Registered office: 120 Manor Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.


Don’t Limit Your Business. Reach for the Cloud.
GigeNET’s Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

GeoCat Bridge for ArcGIS allows instant publishing of data and metadata on GeoServer and GeoNetwork. Visit http://geocat.net for details.


Jose García
GeoCat bv
Veenderweg 13
6721 WD Bennekom
The Netherlands
http://GeoCat.net

Jo Cook
Astun Technology Ltd, The Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 7930 524 155

iShare - Data integration and publishing platform


Company registration no. 5410695. Registered in England and Wales. Registered office: 120 Manor Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.