Hi all,
Here’s my first weekly progress report for the Official Coding Period of GSoC 2025.
Week 1 Report (June 2nd - June 8th)
What did I get done this week?
- Explored the
GeoCroissant
data format and themlcroissant
Python library. - Installed and configured required Python packages (
mlcroissant
,rasterio
,datasets
,torch
, etc.) for geospatial machine learning workflows. - Programmatically generated Croissant-compatible JSON-LD metadata for the HLS Burn Scars dataset using the
mlcroissant
library. - Validated the metadata and resolved warnings related to missing fields (
citeAs
,datePublished
,version
). - Created a custom Hugging Face dataset loading script (
hls.py
) for the HLS Burn Scars dataset, enabling seamless integration with thedatasets
library. - Loaded and explored the dataset’s training split using the custom script and
visualized sample satellite images and corresponding annotation masks usingrasterio
andmatplotlib
. - Developed a PyTorch data pipeline (
BurnScarsDataset
) to handle loading and preprocessing of geospatial image-mask pairs. - Implemented a U-Net architecture in PyTorch and trained the model.
- Updated wiki page can be found in [1]
- The GSoC repository can be found in [2]
What do I plan on doing next week? (June 9th - June 15th)
- Demonstrate a GeoCroissant to STAC conversion example and explain the conversion process step by step.
- Write Python scripts to GeoCroissant to STAC conversion and document the key differences between the GeoCroissant and STAC metadata formats.
- Explore existing tools or libraries that can support or simplify the conversion, and test the conversion script on the metadata.
Am I blocked on anything?
- No
References:
Best Regards,
Harsh Shinde