[GeoNetwork-devel] Example settings for dcat rdf harvester

Jo_Cook2 · February 16, 2023, 4:07pm

Hi All,

I’m really excited that the new simple URL harvester in Geonetwork 4.2.x is available. I was wondering if someone could provide me with some example config for DCAT/rdf?

I can see in https://github.com/geonetwork/core-geonetwork/commit/c57a1a8066c610ac345f230565640d5ff20a6f73 that the following URL has been tested: http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf so I was wondering what the exact settings were? eg the Dataset Element to loop on, the Element for the UUID of each record, and the XSL transformation to apply?

I am testing with my own rdf, and while I think it’s looping through the dataset elements correctly but I don’t think I have the UUID element correct.

Many thanks

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Francois_Prunayre · February 17, 2023, 7:27am

Hi Jo, for DCAT feed the configuration should be the one proposed when you select “DCAT feed > ISO” ie. only XSLT conversion
https://github.com/geonetwork/core-geonetwork/pull/6771/files#diff-8874a12490d4fcb4983bba24e2d37e80cce8e275a01d6e23d67589e9cbf6f315R634

DCAT feed content is retrieved and then SPARQL queries are applied to collect CatalogRecord and Dataset from the feed which is a RDF graph so there are not really loop elements as you would do for a tree structure of JSON or XML documents.
So it sounds more to some specificities of your RDF files - can you share URL so we could have a look?

Cheers.

Francois

Le jeu. 16 févr. 2023 à 17:14, Jo Cook via GeoNetwork-devel <geonetwork-devel@lists.sourceforge.net> a écrit :

Hi All,

I’m really excited that the new simple URL harvester in Geonetwork 4.2.x is available. I was wondering if someone could provide me with some example config for DCAT/rdf?

I can see in https://github.com/geonetwork/core-geonetwork/commit/c57a1a8066c610ac345f230565640d5ff20a6f73 that the following URL has been tested: http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf so I was wondering what the exact settings were? eg the Dataset Element to loop on, the Element for the UUID of each record, and the XSL transformation to apply?

I am testing with my own rdf, and while I think it’s looping through the dataset elements correctly but I don’t think I have the UUID element correct.

Many thanks

Jo

–

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

–
Sign up to our mailing list for updates on news, products, conferences, events and training

Astun Technology Ltd, t:+44 1372 744 009 contact us online
web: astuntechnology.com twitter:@astuntech

iShare - enterprise geographic intelligence platform

GeoServer, PostGIS and QGIS training
Support

Company registration no. 5410695. Registered in England and Wales. Registered office: Penrose House, 67 Hightown Road, Banbury, OX16 9BE VAT no. 864201149.

GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Jo_Cook2 · February 17, 2023, 10:20am

Hi Francois,

Thanks- this is my URL https://data.spatialhub.scot/catalog.rdf

With schema:iso19115-3.2018:convert/fromSPARQL-DCAT I get a null-pointer exception (understandable because there isn’t a CatalogRecord element). I get more progress with schema:iso19115-3.2018:convert/DCAT/sparql-to-iso19115-3, looping on .//dcat:Dataset, in that it clearly loops through every record but I can’t find the right setting for the identifier.

I can write a new conversion xsl if necessary.

All the best

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Francois_Prunayre · February 17, 2023, 10:42am

Hi, https://github.com/geonetwork/core-geonetwork/blob/main/harvesters/src/main/resources/harvester-resources/simpleUrl/sparql/add-CatalogRecord.rq this should add CatalogRecord if they don’t exist in the feed. Maybe something wrong around here. We should maybe look into this before adding a new conversion.

Francois

Le ven. 17 févr. 2023 à 11:21, Jo Cook <jocook@anonymised.com> a écrit :

Hi Francois,

Thanks- this is my URL https://data.spatialhub.scot/catalog.rdf

With schema:iso19115-3.2018:convert/fromSPARQL-DCAT I get a null-pointer exception (understandable because there isn’t a CatalogRecord element). I get more progress with schema:iso19115-3.2018:convert/DCAT/sparql-to-iso19115-3, looping on .//dcat:Dataset, in that it clearly loops through every record but I can’t find the right setting for the identifier.

I can write a new conversion xsl if necessary.

All the best

Jo

On Fri, Feb 17, 2023 at 7:27 AM Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Hi Jo, for DCAT feed the configuration should be the one proposed when you select “DCAT feed > ISO” ie. only XSLT conversion
https://github.com/geonetwork/core-geonetwork/pull/6771/files#diff-8874a12490d4fcb4983bba24e2d37e80cce8e275a01d6e23d67589e9cbf6f315R634

DCAT feed content is retrieved and then SPARQL queries are applied to collect CatalogRecord and Dataset from the feed which is a RDF graph so there are not really loop elements as you would do for a tree structure of JSON or XML documents.
So it sounds more to some specificities of your RDF files - can you share URL so we could have a look?

Cheers.

Francois

Le jeu. 16 févr. 2023 à 17:14, Jo Cook via GeoNetwork-devel <geonetwork-devel@lists.sourceforge.net> a écrit :

Hi All,

I’m really excited that the new simple URL harvester in Geonetwork 4.2.x is available. I was wondering if someone could provide me with some example config for DCAT/rdf?

I can see in https://github.com/geonetwork/core-geonetwork/commit/c57a1a8066c610ac345f230565640d5ff20a6f73 that the following URL has been tested: http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf so I was wondering what the exact settings were? eg the Dataset Element to loop on, the Element for the UUID of each record, and the XSL transformation to apply?

I am testing with my own rdf, and while I think it’s looping through the dataset elements correctly but I don’t think I have the UUID element correct.

Many thanks

Jo

–

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

–
Sign up to our mailing list for updates on news, products, conferences, events and training

Astun Technology Ltd, t:+44 1372 744 009 contact us online
web: astuntechnology.com twitter:@astuntech

iShare - enterprise geographic intelligence platform

GeoServer, PostGIS and QGIS training
Support

Company registration no. 5410695. Registered in England and Wales. Registered office: Penrose House, 67 Hightown Road, Banbury, OX16 9BE VAT no. 864201149.

GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

–

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

–
Sign up to our mailing list for updates on news, products, conferences, events and training

Astun Technology Ltd, t:+44 1372 744 009 contact us online
web: astuntechnology.com twitter:@astuntech

iShare - enterprise geographic intelligence platform

GeoServer, PostGIS and QGIS training
Support

Company registration no. 5410695. Registered in England and Wales. Registered office: Penrose House, 67 Hightown Road, Banbury, OX16 9BE VAT no. 864201149.

Jo_Cook2 · February 17, 2023, 11:15am

Hi Francois,

Thanks! For what it’s worth, with the schema:iso19115-3.2018:convert/fromSPARQL-DCAT converter the error I get (for each record) is:

2023-02-17T11:02:53,137 ERROR [geonetwork.harvester] - Failed to apply conversion schema:iso19115-3.2018:convert/DCAT/sparql-to-iso19115-3 to record null. Error is: An empty sequence is not allowed as the first argument of gn-fn-sparql:getSubject()

Would you like me to submit a GitHub issue for this?

All the best

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

fgravin · February 20, 2023, 9:51am

Hi Jo,

For our datahub project, we “try” to have working and clean conversions from

ESRI json dcat
DKAN
ODS v1

We actually never tried CKAN nor XML nor RDF harvesting so I can’t really help.

Couldn’t your CKAN provide a dcat in json output ?

Cheers

···

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Florent Gravin
Technical Leader - Architect
+33 4 58 48 20 36

Jo_Cook2 · February 20, 2023, 3:49pm

Hi Florent,

I’m not having much luck getting a json dcat output from this particular URL. I’m interested in what settings you are using for the ESRI json dcat- I’m having trouble with one of those as well!

Thanks

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

fgravin · February 20, 2023, 4:20pm

Hi Jo,

Here what we use
I think that pageFrom and pageSize params are not relevant cause the url might returns the whole catalog in one shot.

···

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Florent Gravin
Technical Leader - Architect
+33 4 58 48 20 36

Jo_Cook2 · February 20, 2023, 6:36pm

Hi Florent,

Thanks, that was a great help- I can now successfully harvest from an ESRI json dcat endpoint!

Jo

(attachments)

Screenshot from 2023-02-20 17-19-07.png

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

fgravin · February 27, 2023, 8:39pm

Hi Jo,

Happy to hear that !
I think it would be opportune to start a new session in https://geonetwork-opensource.org/manuals/4.0.x/en/user-guide/harvesting/index.html about the simpleUrl harvester with exemples for different types of inputs (ods, esri, ckan etc…).
We can help contributing to this.

Cheers

(attachments)

···

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Florent Gravin
Technical Leader - Architect
+33 4 58 48 20 36