[GeoNetwork-users] RDF export options

Hi All,

I'm trying to harvest a subset of records from a Geonetwork 3.4.x
installation into CKAN (data.gov.uk) as rdf. There's almost no guidance on
their site about what they accept, apart from this:
https://guidance.data.gov.uk/publish_and_manage_data/harvest_or_add_data/harvest_data/#harvest-data.
I've tried a couple of approaches so far, with varying results:

1) a virtual CSW endpoint with the following CSW GetRecords options:
?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&typeNames=dcat&ElementSetName=full&resultType=results

This produces an error in CKAN about needing a plugin installed for xml, so
I looked at providing an outputFormat=application/json parameter to the
above request, but that simply produced an error about an invalid parameter
value.

2) rdf.search?_cat=mysubset produces what seems to be a valid output but
CKAN imports a single record with multiple attached datasets rather than
multiple records. It also doesn't seem to bring through the actual metadata.

Firstly, can I configure my schema to accept more outputformats?
Secondly, am I missing anything with these two approaches?

My final plan is to use the DCAT schema plugin:
https://github.com/metadata101/dcat-ap1.1/tree/3.4.x but I don't know much
about it and whether it's going to help at all.

Can anyone provide me with any advice?

I would be happy to contribute to the documentation about this if I can
figure it out!

Jo

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/&gt; twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training&gt;

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120 Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

Hi Jo

See feedback inline.

Regards,
Jose García

On Tue, Apr 2, 2019 at 4:37 PM Jo Cook <jocook@anonymised.com> wrote:

Hi All,

I'm trying to harvest a subset of records from a Geonetwork 3.4.x
installation into CKAN (data.gov.uk) as rdf. There's almost no guidance on
their site about what they accept, apart from this:

https://guidance.data.gov.uk/publish_and_manage_data/harvest_or_add_data/harvest_data/#harvest-data
.
I've tried a couple of approaches so far, with varying results:

1) a virtual CSW endpoint with the following CSW GetRecords options:

?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&typeNames=dcat&ElementSetName=full&resultType=results

This produces an error in CKAN about needing a plugin installed for xml, so
I looked at providing an outputFormat=application/json parameter to the
above request, but that simply produced an error about an invalid parameter
value.

See comment in next item.

2) rdf.search?_cat=mysubset produces what seems to be a valid output but
CKAN imports a single record with multiple attached datasets rather than
multiple records. It also doesn't seem to bring through the actual
metadata.

About the single record, have you check if corresponds to the 1st Dataset
or maybe to the Catalog element in the RDF?

Firstly, can I configure my schema to accept more outputformats?

GeoNetwork seem only supporting application/xml, see
https://github.com/geonetwork/core-geonetwork/blob/master/csw-server/src/main/java/org/fao/geonet/component/csw/GetRecords.java#L152-L153

I guess if CSW spec supports other formats, GeoNetwork can be extended to
support them, but requires some analysis to evaluate the changes required.

Secondly, am I missing anything with these two approaches?

There's also a service rdf.metadata.public.get that returns a RDF file (xml
also) with all published metadata, but not sure if will make any diff as
the format should be the same as when requesting CSW or single metadata RDF.

My final plan is to use the DCAT schema plugin:
https://github.com/metadata101/dcat-ap1.1/tree/3.4.x but I don't know much
about it and whether it's going to help at all.

Can anyone provide me with any advice?

Some colleagues used this schema plugin for a project, but not sure if
integrated with CKAN, will ask them to provide further feedback in case
they manage about this.

I would be happy to contribute to the documentation about this if I can
figure it out!

Jo

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/
> twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training&gt;

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120 Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--

*Vriendelijke groeten / Kind regards,Jose García
<http://www.geocat.net/&gt;Veenderweg 136721 WD BennekomThe NetherlandsT: +31
(0)318 416664 <+31318416664> <https://www.facebook.com/geocatbv&gt;
<https://twitter.com/geocat_bv&gt;
<https://plus.google.com/u/1/+GeocatNetbv/posts&gt;Please consider the
environment before printing this email.*

Hi Jo, would be good to add this question also to the ckan list. Afaik CSW is not a core servicetype in CKAN, my impression is that iso19139 schema is hardcoded in the CSW harvester.

I added Stijn to this thread, who contributed the dcat work. Note that the DCAT schema can result in quite complex RDF, I’m not sure if CKAN will be able to ingest it. This type of metadata would be better ingested by a triple store (i did some experiments with virtuoso, it works quite nicely).

We generally use (the csw-iso19139 harvester or) a push mechanism to push records to CKAN using the CKAN api, i hope to have some sharable code available soon.

Note also the recent work of Francois at https://github.com/geonetwork/core-geonetwork/pull/3212 which provides a data cite formatter in json that could be relevant in this scenario.

Regards, Paul.

On 2 Apr 2019, at 16:35, Jo Cook <jocook@anonymised.com> wrote:

Hi All,

I'm trying to harvest a subset of records from a Geonetwork 3.4.x
installation into CKAN (data.gov.uk) as rdf. There's almost no guidance on
their site about what they accept, apart from this:
https://guidance.data.gov.uk/publish_and_manage_data/harvest_or_add_data/harvest_data/#harvest-data.
I've tried a couple of approaches so far, with varying results:

1) a virtual CSW endpoint with the following CSW GetRecords options:
?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&typeNames=dcat&ElementSetName=full&resultType=results

This produces an error in CKAN about needing a plugin installed for xml, so
I looked at providing an outputFormat=application/json parameter to the
above request, but that simply produced an error about an invalid parameter
value.

2) rdf.search?_cat=mysubset produces what seems to be a valid output but
CKAN imports a single record with multiple attached datasets rather than
multiple records. It also doesn't seem to bring through the actual metadata.

Firstly, can I configure my schema to accept more outputformats?
Secondly, am I missing anything with these two approaches?

My final plan is to use the DCAT schema plugin:
https://github.com/metadata101/dcat-ap1.1/tree/3.4.x but I don't know much
about it and whether it's going to help at all.

Can anyone provide me with any advice?

I would be happy to contribute to the documentation about this if I can
figure it out!

Jo

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/&gt; twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training&gt;

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120 Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork

Hi Jose,

The single record that is produced by the Ckan harvester is here:
https://ckan.publishing.service.gov.uk/dataset/test-defra-iso19139-sample-record
and
this is the URL that I used for harvesting from:
https://public.eametadata.com/geonetwork/srv/eng/rdf.search?_cat=DSP_TEST,
so maybe it is a record for the catalog and not for each dataset.

I'll think about testing the DCAT plugin but I may need a different
approach.

Thanks for your responses

Jo

On Wed, Apr 3, 2019 at 7:29 AM Jose Garcia <jose.garcia@anonymised.com> wrote:

Hi Jo

See feedback inline.

Regards,
Jose García

On Tue, Apr 2, 2019 at 4:37 PM Jo Cook <jocook@anonymised.com> wrote:

Hi All,

I'm trying to harvest a subset of records from a Geonetwork 3.4.x
installation into CKAN (data.gov.uk) as rdf. There's almost no guidance
on
their site about what they accept, apart from this:

https://guidance.data.gov.uk/publish_and_manage_data/harvest_or_add_data/harvest_data/#harvest-data
.
I've tried a couple of approaches so far, with varying results:

1) a virtual CSW endpoint with the following CSW GetRecords options:

?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&typeNames=dcat&ElementSetName=full&resultType=results

This produces an error in CKAN about needing a plugin installed for xml,
so
I looked at providing an outputFormat=application/json parameter to the
above request, but that simply produced an error about an invalid
parameter
value.

See comment in next item.

2) rdf.search?_cat=mysubset produces what seems to be a valid output but
CKAN imports a single record with multiple attached datasets rather than
multiple records. It also doesn't seem to bring through the actual
metadata.

About the single record, have you check if corresponds to the 1st Dataset
or maybe to the Catalog element in the RDF?

Firstly, can I configure my schema to accept more outputformats?

GeoNetwork seem only supporting application/xml, see
https://github.com/geonetwork/core-geonetwork/blob/master/csw-server/src/main/java/org/fao/geonet/component/csw/GetRecords.java#L152-L153

I guess if CSW spec supports other formats, GeoNetwork can be extended to
support them, but requires some analysis to evaluate the changes required.

Secondly, am I missing anything with these two approaches?

There's also a service rdf.metadata.public.get that returns a RDF file
(xml also) with all published metadata, but not sure if will make any diff
as the format should be the same as when requesting CSW or single metadata
RDF.

My final plan is to use the DCAT schema plugin:
https://github.com/metadata101/dcat-ap1.1/tree/3.4.x but I don't know
much
about it and whether it's going to help at all.

Can anyone provide me with any advice?

Some colleagues used this schema plugin for a project, but not sure if
integrated with CKAN, will ask them to provide further feedback in case
they manage about this.

I would be happy to contribute to the documentation about this if I can
figure it out!

Jo

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on
news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/
> twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training
>

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120
Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--

*Vriendelijke groeten / Kind regards,Jose García
<http://www.geocat.net/&gt;Veenderweg 136721 WD BennekomThe NetherlandsT: +31
(0)318 416664 <+31318416664> <https://www.facebook.com/geocatbv&gt;
<https://twitter.com/geocat_bv&gt;
<https://plus.google.com/u/1/+GeocatNetbv/posts&gt;Please consider the
environment before printing this email.*

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/&gt; twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training&gt;

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120 Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

Thanks Paul,

I'll give the DCAT plugin a test to see if it gets me what I need.

Thanks again

Jo

On Wed, Apr 3, 2019 at 8:42 AM Paul van Genuchten <
paul.vangenuchten@anonymised.com> wrote:

Hi Jo, would be good to add this question also to the ckan list. Afaik CSW
is not a core servicetype in CKAN, my impression is that iso19139 schema is
hardcoded in the CSW harvester.

I added Stijn to this thread, who contributed the dcat work. Note that the
DCAT schema can result in quite complex RDF, I’m not sure if CKAN will be
able to ingest it. This type of metadata would be better ingested by a
triple store (i did some experiments with virtuoso, it works quite nicely).

We generally use (the csw-iso19139 harvester or) a push mechanism to push
records to CKAN using the CKAN api, i hope to have some sharable code
available soon.

Note also the recent work of Francois at
https://github.com/geonetwork/core-geonetwork/pull/3212 which provides a
data cite formatter in json that could be relevant in this scenario.

Regards, Paul.

On 2 Apr 2019, at 16:35, Jo Cook <jocook@anonymised.com> wrote:

Hi All,

I'm trying to harvest a subset of records from a Geonetwork 3.4.x
installation into CKAN (data.gov.uk) as rdf. There's almost no guidance on
their site about what they accept, apart from this:

https://guidance.data.gov.uk/publish_and_manage_data/harvest_or_add_data/harvest_data/#harvest-data
.
I've tried a couple of approaches so far, with varying results:

1) a virtual CSW endpoint with the following CSW GetRecords options:

?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&typeNames=dcat&ElementSetName=full&resultType=results

This produces an error in CKAN about needing a plugin installed for xml, so
I looked at providing an outputFormat=application/json parameter to the
above request, but that simply produced an error about an invalid parameter
value.

2) rdf.search?_cat=mysubset produces what seems to be a valid output but
CKAN imports a single record with multiple attached datasets rather than
multiple records. It also doesn't seem to bring through the actual
metadata.

Firstly, can I configure my schema to accept more outputformats?
Secondly, am I missing anything with these two approaches?

My final plan is to use the DCAT schema plugin:
https://github.com/metadata101/dcat-ap1.1/tree/3.4.x but I don't know much
about it and whether it's going to help at all.

Can anyone provide me with any advice?

I would be happy to contribute to the documentation about this if I can
figure it out!

Jo

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/
> twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training&gt;

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120 Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.

_______________________________________________
GeoNetwork-users mailing list
GeoNetwork-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geonetwork-users
GeoNetwork OpenSource is maintained at
http://sourceforge.net/projects/geonetwork

--
*Jo Cook*
t:+44 7930 524 155/twitter:@archaeogeek
Please note that currently I do not work on Friday afternoons. For urgent
responses at that time, please visit support.astuntechnology.com or phone
our office on 01372 744009

--
--
*Sign up to our mailing list
<https://astuntechnology.com/company/#email-updates&gt; for updates on news,
products, conferences, events and training*
*
*

Astun Technology Ltd, The
Coach House, 17 West Street, Epsom, Surrey, KT18 7RL, UK
t:+44 1372 744
009 w: astuntechnology.com <http://astuntechnology.com/&gt; twitter:@astuntech
<https://twitter.com/astuntech&gt;

iShare - enterprise geographic
intelligence platform <https://astuntechnology.com/ishare/&gt;
GeoServer,
PostGIS and QGIS training <https://astuntechnology.com/services/#training&gt;

Helpdesk and customer portal
<http://support.astuntechnology.com/support/login&gt;

Company registration
no. 5410695. Registered in England and Wales. Registered office: 120 Manor
Green Road, Epsom, Surrey, KT19 8LN VAT no. 864201149.