Clarification on OGC API Records Purpose and XML Schema Handling

fgravin · May 15, 2025, 3:44pm

Hi there,

Following up on our brief discussion yesterday, I’d like to share some insights regarding the purpose of OGC API Records and the handling of various XML schemas.

From what I understand, OGC API Records aims to establish a set of standard, simple, and lightweight APIs for metadata exchange—covering operations such as fetch, search, view, and more.

The default model is based on GeoJSON, which uses a very minimal schema. It’s also possible to configure a JSON output using the same property structure but without the GeoJSON syntax overhead. It’s worth noting that in GN4, the JSON output currently reflects the Elasticsearch response—something I believe is not ideal.

Another important objective of OGC API Records seems to be the eventual replacement of legacy CSW endpoints. Given the inherent complexity of metadata and GeoNetwork’s intention to support full XML ISO schemas, I believe OGC Records should also support XML-based outputs—such as f=dcat or f=iso19139. However, this becomes problematic when relying solely on the index, as it would result in a lossy transformation.

This brings me to a concern: if we are planning to replace CSW harvesting by OGC Records harvesting, data loss during the process would be unacceptable. So what does this imply? If the index is the source for the OGC Records API, we risk omitting full XML models, which would ultimately downgrade the CSW capabilities.

WDYT ?

davidblasby · May 15, 2025, 4:59pm

Hi, Florent,

Right now there is a Elastic JSON → OGCAPI-Records JSON output (as specified in the spec). The -Records output is very simple, so this is an easy output and there isn’t too much in the output (i.e. keyword/theme, and contact).
For the GN5/GN-microservices ogcapi-records, I’ve extended the output to also include the entire Elastic JSON index as well as the underlying record XML text (in the JSON output). The -Records spec is based on DC/DCAT so its very anemic. This output gives the client lots of options - they can use the standard -Records information as well as a bunch of “extra” information. The -Records spec seems to be based around search and a common format for simple searching and summary of metadata. I don’t think its for detailed metadata.

For example, a “nice” interactive html view of a metadata record would (very likely) need more than the information in the Elastic JSON index.

For CSW endpoints, the output isn’t defined in the same way as with -Records. I would recommend using the full underlying XML for these type of xforms as any type of “pivot model” will be lossy.

The plan is to support DCAT and XML output inside the GN5 OGCAPI-Records. You can see some work in that area here (FormatterApi): API / Formatter migration check by fxprunayre · Pull Request #78 · geonetwork/geonetwork · GitHub as well as quite a bit of recent GN4 DCAT-AP* work (francois and jose).

Since this is one of the first modules that will be moved from GN4 to GN5, its going to take a little bit to get our infrastructure in place (i.e. things like security, access, exception handling, and GN4 XSLT to GN5 XSLT). The rest of the GN4->GN5 module migration will follow the same ideas. There are a few different ways of doing metadata formatting, but the main one is XSLT.

Once that’s finished we should have a very simple shared library were we can query what output are available for a particular metadata type and I will put that in the items links in ogcapi-records. I expect the vast majority of these will be XML output (via XSLT) - in the same manner as what’s currently available in GN4. I doubt any of these will use the Elastic JSON index. However, I would be more than happy to add the OGCAPI-Records JSON output as a FormatterApi non-XSLT output in the future.

I havent looked at the CSW spec in detail, however, I expect we can do something similar.

I would recommend that the people interested in GN5 get together and really look at the FormatterApi and how its put together.

Its real core GN functionality
I expect it to be reused in the future in other modules
Its the first real module being migrated from GN4, so all the concerns about access, security, etc… need to be addressed. I think we’ll see that any decisions made here will be “monkey-see monkey-do” for all the other modules

Also, my might hear me talk about “GN5++”. I haven’t talked about it too much (I need some more time). However, this is more along the lines of microservices - smaller spring-boot based GN-based apps. An example of this would be being able to just deploy a scalable OGCAPI-Records (or just CSW) into a cluster. There would be a lot of code sharing with GN5 - much in the same way as GeoServer cloud-native and GeoServer standard. However, I expect there to be other apps - like vocabulary management, validation services, etc…

For “GN5” I mean the “GN4” replacement (i.e. a monolith with everything in it) - the GN5++ would be smaller (more specific) apps.

Hope that makes sense - I will be talking about it more in the future so its not too important right now.

Summary:

I don’t think there will be anything other than OGCAPI-Records that uses the Elastic JSON.
Almost all output formats will be XSLT-based (see GN4 output)
FormatterApi is very important to get right
CSW not be using the elastic index

NOTE: The reason why I used the Elastic JSON index is because it is a schema-independent summary of the record. If i didn’t use this, I would have to have a xform-to-ogcapi-records for EACH of the different schemas!! However, this one works because the ogcapi-records output is so simple.

antoniocerciello · May 20, 2025, 11:51am

Hi all,

Thanks for discussing further here. From what I know, there aren’t plans to replace CSW harvesting with OGCARec. Ideally, GN5 will support both CSW and OGC API Records, then both harvesting options should remain as well, ensuring smooth transition and backwards compatibility. From an interoperability point of view, I believe that supporting multiple standards is a great potential of Geonetwork.
On the other hand, I fully agree that OGC API Records must be able to match CSW’s accuracy and completeness, especially when it comes to aspects like complex XML schemas and fidelity of metadata representation.
A possibility could be that OGC API Records harvesting is configurable to request metadata in specific formats. In that case you can request XML output (parameter f) and the schema (parameter profile) to prevent any data loss and preserve 1:1 fidelity.
The main point for me is how to make the available and native (and hopefully a way to discriminate between the two) metadata schemas discoverable through OGC API Records (ideally in a way that aligns with current specifications). So clients can understand the formats available and choose accordingly.

jahow · May 20, 2025, 12:21pm

Hi,

I’d be curious to hear of anyone planning to harvest records from a GeoNetwork catalog in their native XML format while using OGC API Records.

I would say that offering a standard GeoJSON output as well as a more complete DCAT one in OGC API Records should be already quite good and offer interoperability with many platforms, and that CSW can remain the best choice when harvesting records in their native XML format.

There is no XML schema that I know of to include multiple ISO records in a response except CSW, and coming up with a new schema means that it won’t be interoperable.

antoniocerciello · May 20, 2025, 12:43pm

Hi Olivia,

Then I think you would be surprised by how many users still use, and intend to continue using, ISO XML schemas. For many of them OGC API Records is a novelty to keep an eye on, but not an alternative to jump into any time soon. OGC API Record is a standard for publishing metadata documents, it is native to a generation where JSON is the sovereign format, therefore the only one that is formally mandatory by the specification. However, this does not exclude the possibility of publishing documents in other formats, including XML. Obviously many of us would never want to see XML again in our lives, but that day still seems far away.

There is no XML schema that I know of to include multiple ISO records in a response except CSW, and coming up with a new schema means that it won’t be interoperable.

I imagine the interaction with an OGC API endpoint occurs through a collection that contains all the metadata items, which can be requested one by one in the supported formats (JSON, HTML, XML) and possibly indicating the XML schema (there is a profile paramater for this). Of course this is the only endpoint (item) that should support XML format.

Please let me know what you think and if the end user experience is different.

jahow · May 20, 2025, 1:20pm

Sorry, I may have been unclear, what I was saying is that CSW will probably remain the best option when harvesting XML records.

In theory OGC API Records can do the same thing, but it does not specify a way to send multiple XML records at once. Querying a single record in its native XML format is nice but honestly the most valuable feature is to fetch a collection of records at once; this is used when searching as well as harvesting.

It is possible to query an OGC API Records endpoint and do a search in a collection while asking for a DCAT-based output, as this format supports describing a collection of records. But this is not possible in ISO XML (unless reusing the CSW schema which I guess is possible?).

–

camptocamp
INNOVATIVE SOLUTIONS
BY OPEN SOURCE EXPERTS

Olivia Guyot

Geospatial Developer

+49 89 2620 89 924

Le mar. 20 mai 2025, 14:48, Antonio Cerciello via OSGeo Discourse <noreply@discourse.osgeo.org> a écrit :

antoniocerciello GeoNetwork Developer
May 20

Hi Olivia,

Then I think you would be surprised by how many users still use, and intend to continue using, ISO XML schemas. For many of them OGC API Records is a novelty to keep an eye on, but not an alternative to jump into any time soon. OGC API Record is a standard for publishing metadata documents, it is native to a generation where JSON is the sovereign format, therefore the only one that is formally mandatory by the specification. However, this does not exclude the possibility of publishing documents in other formats, including XML. Obviously many of us would never want to see XML again in our lives, but that day still seems far away.

There is no XML schema that I know of to include multiple ISO records in a response except CSW, and coming up with a new schema means that it won’t be interoperable.

I imagine the interaction with an OGC API endpoint occurs through a collection that contains all the metadata items, which can be requested one by one in the supported formats (JSON, HTML, XML) and possibly indicating the XML schema (there is a profile paramater for this). Of course this is the only endpoint (item) that should support XML format.

Please let me know what you think and if the end user experience is different.

Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

davidblasby · May 20, 2025, 3:52pm

Hi,

The spec for harvesting from OGCAPI-Records isn’t fully defined yet. I think its Part 3: Create, Replace, Update, Delete, Harvest:

https://docs.ogc.org/DRAFTS/25-015.html

This is particularly interesting with the basic CRUD operations, however, its still very much a work-in-progress.

I don’t think that a Part 1 (or Part 2) server is meant to be harvested from (although it is certainly possible to do some type of harvesting).

Dave

jahow · May 21, 2025, 1:16pm

Hi Dave,

The OGC API Records spec is indeed not final, but the part 1 (core) should be definitely enough for harvesting since it gives the ability to search and query records with pagination and filters. Honestly I think this is good enough right? I mean this is essentially what we do when harvesting a CSW endpoint: ask for records using pagination and in a specific output.

The part 3 (CRUD & Harvest) is intended for transactions & writing records, so I don’t think we’d need this for harvesting an endpoint.