GSoC 2025: Week 5 Report: AI-ready Dataset Metadata as a Service using ZOO-Project

What did I get done this week? (June 30 – July 06)

1. What did I get done this week?

GeoCroissant Implementation: Successfully integrated GeoCroissant support into Hugging Face’s dataset viewer for geospatial datasets.

  • Implementation: services/worker/src/worker/job_runners/dataset/croissant_crumbs.py

  • Tests: services/worker/tests/job_runners/dataset/test_croissant_crumbs.py

  • Geospatial File Detection: Added detection logic for common geospatial file formats including:
    .tif, .shp, .geojson, .kml, .gpkg, and other widely-used formats.

  • GeoCroissant Context Integration: Extended the Croissant JSON-LD context to include GeoCroissant-specific properties: geo:lat, geo:long, geo:alt, geo:accuracy, geo:timestamp. These properties are conditionally added only when geospatial files are detected.

  • Testing & Documentation: Created and ran extensive test suites to verify GeoCroissant context inclusion and geospatial file detection logic. All tests passed successfully in the croissant-test environment.

  • Code Management: Cleaned up temporary files, maintained production-ready code, and pushed implementation to GitHub fork.

2. Plan for Next Week ( July 7 – July 13):

  • Test the GeoCroissant implementation with real geospatial datasets on Hugging Face.
  • Optimize performance of the geospatial file detection logic.
  • Add support for detecting NetCDF (.nc) file formats.
  • Draft and publish user-facing documentation for the GeoCroissant integration.

3. Am I blocked on anything?

  • No, I am not currently blocked on anything.

Links to Work Done:

Best Regards,
Harsh Shinde

1 Like