GSoC 2025: Week 2 Report: AI-ready Dataset Metadata as a Service using ZOO-Project

Progress Report for Week 2 (June 9 – June 15)

1. What did I get done this week?

  • Parsed GeoCroissant metadata: Implemented a parser to convert GeoCroissant metadata (croissant.json) into a valid STAC Item object using pystac.
  • Mapped fields between GeoCroissant and STAC: Documented and translated key fields (e.g., identifierid, distributionassets, etc.).
  • Handled spatial and temporal metadata: Inferred geometry, bounding box, and temporal coverage (start, end, midpoint) from metadata and description.
  • Asset management: Mapped Croissant distribution to STAC assets, inferred correct media_type, and added roles (e.g., data, metadata, documentation).
  • STAC extensions support: Added support for the STAC Table Extension and integrated column metadata using pystac.extensions.table.
  • Output and validation: Generated and saved a valid STAC item (stac_item.json) and confirmed validity using stac-validator.

2. Plan for Next Week (June 16 – June 22):

  • Create a working example to convert GeoCroissant metadata into DCAT format.
  • Document how GeoCroissant fields map to DCAT properties.
  • Reuse and clean up code from the STAC conversion for DCAT use.
  • Explore ways to validate the DCAT output using SHACL or JSON-LD tools.

3. Am I blocked on anything?

  • No, I am not currently blocked on anything.

Links to Work Done:

Best Regards,
Harsh Shinde