Hello everyone,
As GSoC comes to a end, I’m pleased to share my final report on the work completed over the past three months. This journey has been full of learning, and I’m grateful for the support and guidance from the ZOO-Project community and mentors.
Title and Organization
- Project: AI-ready Dataset Metadata as a Service
- Organization: OSGeo (Open Source Geospatial Foundation), ZOO-Project
Abstract
This GSoC project focused on implementing native support for GeoCroissant metadata to enable AI-ready geospatial datasets. The implementation provides functionalities for metadata generation, transformation, and validation, as well as integration pathways with platforms like STAC, Earth Engine, and Hugging Face. It also introduces Data-Centric AI (DCAI) workflows to assess and improve dataset quality, addressing issues such as annotation errors and bias. This enhancement improves metadata interoperability, streamlines data preparation for machine learning, and lays the foundation for future cross-platform adoption.
State of the Project Before GSoC
While the ZOO-Project already offers solid support for OGC-compliant geoprocessing, it currently doesn’t have built-in support for GeoCroissant—a metadata standard designed specifically for AI-ready geospatial datasets. There are no tools available within ZOO to help users create or validate this kind of metadata or to connect easily with existing platforms like STAC, Earth Engine, or machine learning hubs like HuggingFace and Kaggle. It also lacks workflows that can help users check the quality of their training data or fix common issues like annotation errors or bias. This project aims to fill those gaps and bring these much-needed features to the ZOO-Project.
Benefits to the community/Additions the project will bring to the software:
This project introduces Data-Centric AI (DCAI) workflows with support for GeoCroissant metadata, enabling more effective creation, transformation, and validation of AI-ready geospatial datasets. It helps users improve dataset quality, streamline data preparation for machine learning, and ensure better interoperability with existing standards and platforms.
Key benefits include:
- Simplified generation and conversion of GeoCroissant-compliant metadata.
- Enhanced data quality assessment through DCAI-focused tools.
- Improved discoverability and usability of datasets for AI workflows.
- A foundation for seamless integration with broader geospatial and AI ecosystems.
For detailed implementation examples and workflows, please refer to the Log of Pull Requests section below.
Potential Future Work
After the completion of the core integrations, I aim to expand support to platforms like OpenML by adding geospatial attribute addons and finalizing pending integrations, such as compatibility with the Hugging Face dataset viewer. This will enhance the ecosystem’s robustness, accessibility, and adoption potential. Additionally, I plan to explore further improvements based on community feedback and extended testing.
What I Have Learned
-
Real-World Experience: GSoC 2025 and my work with the ZOO-Project gave me the opportunity to contribute to meaningful, real-world projects in the open-source ecosystem. This experience helped me bridge the gap between theory and practice by applying my knowledge to create AI-ready geospatial metadata tools.
-
Hands-On Coding: Throughout the project, I wrote code, developed new features, resolved bugs, and worked on improving the integration of GeoCroissant. These practical tasks sharpened my problem-solving skills and deepened my understanding of software development workflows.
-
Mentorship: Collaborating with experienced mentors was a significant part of this journey. Their guidance and constructive feedback not only helped me overcome technical challenges but also shaped the way I approach complex problems.
-
Collaboration: Working within a distributed team taught me the importance of collaborative development practices. I became comfortable with version control systems, code reviews, and pull request workflows all essential skills in modern open-source projects.
-
Networking: This project connected me with a vibrant global network of developers, researchers, and open-source enthusiasts in the geospatial AI field. Building these connections has opened up possibilities for future collaborations and valuable professional relationships.
-
Project Management: Managing tasks, planning milestones, and delivering updates on time were crucial aspects of this experience. It taught me how to stay organized and maintain steady progress in a dynamic development environment.
-
Communication Skills: Regularly documenting my work and sharing progress with mentors and the community helped me improve my written and professional communication. Clear communication made collaboration smoother and ensured everyone stayed aligned.
-
Open-Source Culture: Being part of this project reinforced the importance of transparency, community involvement, and collaborative growth values that I will carry forward in my future contributions to open-source.
Links
- Weekly Reports & Updates: View Weekly Reports
- Project Proposal: Google Docs Proposal
- GitHub Repository: DCAI
- Project Wiki: Project Wiki Home
- Final Report: View Final Report
I am truly grateful to be a part of the wonderful GSoC and OSGeo communities. This program has been an incredible learning journey, helping me enhance my understanding of open-source contributions and significantly improve my programming skills. I would like to express my heartfelt thanks to my mentors, Gérald Fenoy and Chetan Mahajan, for their incredible guidance and continuous support throughout the program, as well as to the entire community for their encouragement and constructive feedback during my GSoC journey.
Thank you and Regards,
Harsh Shinde