[GeoNetwork-devel] Example settings for dcat rdf harvester

Hi All,

I’m really excited that the new simple URL harvester in Geonetwork 4.2.x is available. I was wondering if someone could provide me with some example config for DCAT/rdf?

I can see in https://github.com/geonetwork/core-geonetwork/commit/c57a1a8066c610ac345f230565640d5ff20a6f73 that the following URL has been tested: http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf so I was wondering what the exact settings were? eg the Dataset Element to loop on, the Element for the UUID of each record, and the XSL transformation to apply?

I am testing with my own rdf, and while I think it’s looping through the dataset elements correctly but I don’t think I have the UUID element correct.

Many thanks

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Hi Jo, for DCAT feed the configuration should be the one proposed when you select “DCAT feed > ISO” ie. only XSLT conversion
https://github.com/geonetwork/core-geonetwork/pull/6771/files#diff-8874a12490d4fcb4983bba24e2d37e80cce8e275a01d6e23d67589e9cbf6f315R634

DCAT feed content is retrieved and then SPARQL queries are applied to collect CatalogRecord and Dataset from the feed which is a RDF graph so there are not really loop elements as you would do for a tree structure of JSON or XML documents.
So it sounds more to some specificities of your RDF files - can you share URL so we could have a look?

Cheers.

Francois

Le jeu. 16 févr. 2023 à 17:14, Jo Cook via GeoNetwork-devel <geonetwork-devel@lists.sourceforge.net> a écrit :

Hi All,

I’m really excited that the new simple URL harvester in Geonetwork 4.2.x is available. I was wondering if someone could provide me with some example config for DCAT/rdf?

I can see in https://github.com/geonetwork/core-geonetwork/commit/c57a1a8066c610ac345f230565640d5ff20a6f73 that the following URL has been tested: http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf so I was wondering what the exact settings were? eg the Dataset Element to loop on, the Element for the UUID of each record, and the XSL transformation to apply?

I am testing with my own rdf, and while I think it’s looping through the dataset elements correctly but I don’t think I have the UUID element correct.

Many thanks

Jo

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009


Sign up to our mailing list for updates on news, products, conferences, events and training

Astun Technology Ltd, t:+44 1372 744 009 contact us online
web: astuntechnology.com twitter:@astuntech

iShare - enterprise geographic intelligence platform

GeoServer, PostGIS and QGIS training
Support

Company registration no. 5410695. Registered in England and Wales. Registered office: Penrose House, 67 Hightown Road, Banbury, OX16 9BE VAT no. 864201149.


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Francois,

Thanks- this is my URL https://data.spatialhub.scot/catalog.rdf

With schema:iso19115-3.2018:convert/fromSPARQL-DCAT I get a null-pointer exception (understandable because there isn’t a CatalogRecord element). I get more progress with schema:iso19115-3.2018:convert/DCAT/sparql-to-iso19115-3, looping on .//dcat:Dataset, in that it clearly loops through every record but I can’t find the right setting for the identifier.

I can write a new conversion xsl if necessary.

All the best

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Hi, https://github.com/geonetwork/core-geonetwork/blob/main/harvesters/src/main/resources/harvester-resources/simpleUrl/sparql/add-CatalogRecord.rq this should add CatalogRecord if they don’t exist in the feed. Maybe something wrong around here. We should maybe look into this before adding a new conversion.

Francois

Le ven. 17 févr. 2023 à 11:21, Jo Cook <jocook@anonymised.com> a écrit :

Hi Francois,

Thanks- this is my URL https://data.spatialhub.scot/catalog.rdf

With schema:iso19115-3.2018:convert/fromSPARQL-DCAT I get a null-pointer exception (understandable because there isn’t a CatalogRecord element). I get more progress with schema:iso19115-3.2018:convert/DCAT/sparql-to-iso19115-3, looping on .//dcat:Dataset, in that it clearly loops through every record but I can’t find the right setting for the identifier.

I can write a new conversion xsl if necessary.

All the best

Jo

On Fri, Feb 17, 2023 at 7:27 AM Francois Prunayre <fx.prunayre@anonymised.com> wrote:

Hi Jo, for DCAT feed the configuration should be the one proposed when you select “DCAT feed > ISO” ie. only XSLT conversion
https://github.com/geonetwork/core-geonetwork/pull/6771/files#diff-8874a12490d4fcb4983bba24e2d37e80cce8e275a01d6e23d67589e9cbf6f315R634

DCAT feed content is retrieved and then SPARQL queries are applied to collect CatalogRecord and Dataset from the feed which is a RDF graph so there are not really loop elements as you would do for a tree structure of JSON or XML documents.
So it sounds more to some specificities of your RDF files - can you share URL so we could have a look?

Cheers.

Francois

Le jeu. 16 févr. 2023 à 17:14, Jo Cook via GeoNetwork-devel <geonetwork-devel@lists.sourceforge.net> a écrit :

Hi All,

I’m really excited that the new simple URL harvester in Geonetwork 4.2.x is available. I was wondering if someone could provide me with some example config for DCAT/rdf?

I can see in https://github.com/geonetwork/core-geonetwork/commit/c57a1a8066c610ac345f230565640d5ff20a6f73 that the following URL has been tested: http://mow-dataroom.s3-eu-west-1.amazonaws.com/dr_dcat.rdf so I was wondering what the exact settings were? eg the Dataset Element to loop on, the Element for the UUID of each record, and the XSL transformation to apply?

I am testing with my own rdf, and while I think it’s looping through the dataset elements correctly but I don’t think I have the UUID element correct.

Many thanks

Jo

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009


Sign up to our mailing list for updates on news, products, conferences, events and training

Astun Technology Ltd, t:+44 1372 744 009 contact us online
web: astuntechnology.com twitter:@astuntech

iShare - enterprise geographic intelligence platform

GeoServer, PostGIS and QGIS training
Support

Company registration no. 5410695. Registered in England and Wales. Registered office: Penrose House, 67 Hightown Road, Banbury, OX16 9BE VAT no. 864201149.


GeoNetwork-devel mailing list
GeoNetwork-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009


Sign up to our mailing list for updates on news, products, conferences, events and training

Astun Technology Ltd, t:+44 1372 744 009 contact us online
web: astuntechnology.com twitter:@astuntech

iShare - enterprise geographic intelligence platform

GeoServer, PostGIS and QGIS training
Support

Company registration no. 5410695. Registered in England and Wales. Registered office: Penrose House, 67 Hightown Road, Banbury, OX16 9BE VAT no. 864201149.

Hi Francois,

Thanks! For what it’s worth, with the schema:iso19115-3.2018:convert/fromSPARQL-DCAT converter the error I get (for each record) is:

2023-02-17T11:02:53,137 ERROR [geonetwork.harvester] - Failed to apply conversion schema:iso19115-3.2018:convert/DCAT/sparql-to-iso19115-3 to record null. Error is: An empty sequence is not allowed as the first argument of gn-fn-sparql:getSubject()

Would you like me to submit a GitHub issue for this?

All the best

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Hi Jo,

For our datahub project, we “try” to have working and clean conversions from

  • ESRI json dcat
  • DKAN
  • ODS v1

We actually never tried CKAN nor XML nor RDF harvesting so I can’t really help.

Couldn’t your CKAN provide a dcat in json output ?

Cheers

···

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Florent Gravin
Technical Leader - Architect
+33 4 58 48 20 36

Hi Florent,

I’m not having much luck getting a json dcat output from this particular URL. I’m interested in what settings you are using for the ESRI json dcat- I’m having trouble with one of those as well!

Thanks

Jo

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Hi Jo,

Here what we use
I think that pageFrom and pageSize params are not relevant cause the url might returns the whole catalog in one shot.

···

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Florent Gravin
Technical Leader - Architect
+33 4 58 48 20 36

Hi Florent,

Thanks, that was a great help- I can now successfully harvest from an ESRI json dcat endpoint!

Jo

(attachments)

Screenshot from 2023-02-20 17-19-07.png

···

Jo Cook
t:+44 7930 524 155 | twitter:@archaeogeek | mastodon:@archaeogeek@anonymised.com.
Please note that currently I do not work on Friday afternoons. For urgent responses at that time, please visit support.astuntechnology.com or phone our office on 01372 744009

Hi Jo,

Happy to hear that !
I think it would be opportune to start a new session in https://geonetwork-opensource.org/manuals/4.0.x/en/user-guide/harvesting/index.html about the simpleUrl harvester with exemples for different types of inputs (ods, esri, ckan etc…).
We can help contributing to this.

Cheers

(attachments)

···

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Florent Gravin
Technical Leader - Architect
+33 4 58 48 20 36