GSoC 2025: Week 4 Report: AI-ready Dataset Metadata as a Service using ZOO-Project

Progress Report for Week 4 (June 23 – June 29)

1. What did I get done this week?

  • Implemented STAC to GeoCroissant converter in Python with validation and full metadata preservation.
  • Handled references, item_assets, and distribution fields for improved dataset interoperability.
  • Introduced custom fields like geocr:visualizations and geocr:summaries to retain extra STAC context.
  • Ensured compliance with Croissant and GeoCroissant specs including checksum, encoding format, and record structure.
  • Developed a visual map overlay using Folium for AGB dataset with raster, GPKG tile boundaries, and training data points.
  • Validated and exported the final croissant.json with support for visualization and external STAC item linking.

2. Plan for Next Week (June 30 – July 6):

  • Generate example conversions from Hugging Face datasets into valid GeoCroissant format.
  • Integrate Hugging Face dataset download links (hf.co/datasets/...) as accessURL in the distribution field.
  • Validate all generated metadata with mlcroissant to ensure compliance.
  • Review viewer compatibility for GeoCroissant metadata on Hugging Face (dataset-viewer behavior).

3. Am I blocked on anything?

  • No, I am not currently blocked on anything.

Links to Work Done:

Best Regards,
Harsh Shinde

1 Like