GSoC 2025: Week 10 Report: AI-ready Dataset Metadata as a Service using ZOO-Project

Progress Report for Week 10 (Aug 3 – Aug 10)

1. What did I get done this week?

  • Implemented the optional mlcroissant[geo] extension for geospatial support.
  • Updated pyproject.toml to include all required geo dependencies (geopandas, shapely, pyproj, rasterio, pystac) under the [project.optional-dependencies] section.
  • Developed a robust stac_to_geocroissant converter supporting both file and dictionary inputs, with comprehensive mapping of STAC fields to GeoCroissant JSON-LD.
  • Validated the workflow by converting sample STAC catalogs to GeoCroissant format and confirming output with the standard mlcroissant validate CLI.
  • Demonstrated the complete workflow in Jupyter Lab, from installation to conversion and validation.

2. Plan for Next Week :

  • Support for different conversion types, including [geo] for GeoCroissant.
  • Implement comprehensive test cases for conversions.
  • Add an additional validator specifically for GeoCroissant outputs to ensure geospatial metadata integrity.

3. Am I blocked on anything?

  • No, I am not currently blocked on anything.

Links to Work Done:

Best Regards,
Harsh Shinde