Progress Report for Week 3 (June 16 – June 22)
1. What did I get done this week?
- Implemented
geocroissant_to_geodcat.py
: Developed a modular Python script to convert GeoCroissant metadata (croissant.json
) into GeoDCAT-compliant RDF usingrdflib
, designed for reuse and automation. - Created
conversion-geodcat-notebook.ipynb
: Built a companion Jupyter notebook that demonstrates and explains the GeoCroissant-to-GeoDCAT conversion process interactively, including validation, RDF inspection, and querying. - Mapped GeoCroissant to GeoDCAT vocabulary: Defined and applied a consistent mapping between metadata fields (e.g.,
name → dct:title
,distribution → dcat:Distribution
, etc.), ensuring semantic accuracy across standards. - Handled spatial and temporal dimensions: Programmed extraction and representation of
temporalExtent
(startDate
,endDate
) andspatialExtent
(dct:spatial
) using controlled vocabularies and persistent URIs. - Generated and structured distributions: Translated each
distribution
into adcat:Distribution
, enriched with access URLs, media types, checksum validation (SHA256), and hierarchical relationships (isPartOf
,hasPart
). - Produced interoperable RDF outputs: Serialized final metadata into both
JSON-LD
(geodcat.jsonld
) andTurtle
(geodcat.ttl
) formats to support semantic web integration and catalog ingestion. - Validated with SHACL constraints: Used
pyshacl
to confirm conformance of the RDF graphs to GeoDCAT structural requirements, improving metadata robustness and quality. - Audited metadata via querying: Inspected and extracted key elements such as
dcat:distribution
anddcat:accessURL
from the RDF graph to verify completeness and correctness of dataset access metadata.
2. Plan for Next Week (June 23 – June 29):
- Discuss GeoCroissant support on Hugging Face GitHub (
dataset-viewer
repo). - Continue structuring Kaggle datasets into GeoCroissant format.
- Add Hugging Face and Kaggle-specific attributes (e.g.,
citation
,license
,split
,repository
) to GeoCroissant metadata. - Explore integration of
hf.co/datasets
andkaggle datasets download
links asaccessURL
indistribution
.
3. Am I blocked on anything?
- No, I am not currently blocked on anything.
Links to Work Done:
- Project Wiki Page: GSoC 2025 AI‐ready Dataset Metadata as a Service using ZOO‐Project · HarshShinde0/ZOO-AI-DATASET-MAAS Wiki · GitHub
- Public Repository: ZOO-AI-DATASET-MAAS/GeoCroissant to GeoDCAT at main · HarshShinde0/ZOO-AI-DATASET-MAAS · GitHub
Best Regards,
Harsh Shinde