Who I Am and Why I’m Here
Hello Grass Team,
I’m Selma ^^ , a final year computer vision Master’s student. I’m working specifically on open-vocabulary satellite segmentation with vision-language models, teaching a model to segment imagery from free-form text rather than fixed labels.
While developing my thesis, I discovered GRASS GIS and I realized that its TGIS framework is exactly the temporal infrastructure my workflow had been missing.
In my project (my Master’s) I found out that clouds are the main bottleneck in many time-series sequences, 40% or more of optical imagery is unusable.
The usual workaround… breaks temporal continuity
Well, SAR imagery solves this, but integrating SAR and optical data in a GIS framework has been painful !! there’s no clean path from a GRASS temporal dataset to a PyTorch DataLoader without exporting to disk, losing metadata, and writing brittle glue code.
This project addresses that gap, not the model itself, but the plumbing between GRASS’s temporal infrastructure and the deep learning stack I use every day.
The Actual Problem
GRASS has impressive temporal infrastructure. The TGIS framework for sure; it handles metadata, topology, and time-series queries.. in ways most GIS tools don’t attempt.
But that infrastructure stops at the edge of the NumPy/SciPy stack.. There’s no native bridge from a SpaceTimeRasterDataset query to a 4D PyTorch tensor!
There’s no built in way to align datasets across sensors for example pairing a SAR acquisition from one STRDS (a GRASS Space-Time Raster Dataset) with the nearest clear-sky optical map from another.
Yes, we can do it manually, but it usually means rewriting the same annoying data-handling scripts every time the kind of logic that ends up copy-pasted between experiments. At that point, it’s clear this logic belongs in the library, not scattered across research notebooks.
This is also a different problem from what r.learn.ml already solves.
Because it’s a tool designed for classical ML (scikit-learn), not deep learning, and works on single rasters, not temporal datasets.
So what we’re trying to propose is slightly different because it:
-
works with time-series data
-
combines multiple sensors (SAR + optical)
-
feeds the data directly into PyTorch models
-
while keeping everything inside the GRASS spatial system.
What I Found in the Codebase
While exploring whether RasterRowIO buffers could be piped directly into tensors (avoiding intermediate disk exports).
I noticed that PyGRASS already exposes row level raster access, but there’s a weird gap where those buffers don’t easily talk to PyTorch tensors without a lot of manual casting.
So the ‘bones’ are there, but the bridge is missing.
Then reading through abstract_space_time_dataset.py, I found that get_registered_maps_as_objects() on SpaceTimeRasterDataset is powerful for querying within a single STRDS, but it doesn’t provide cross-dataset temporal alignment.
In other words, pairing a SAR map from one STRDS with the nearest optical map from another isn’t built in yet.
The TGIS temporal topology engine already has the primitives for this. They just aren’t assembled into a cross-sensor workflow.
One thing I want to be upfront about is the memory alignment when moving raster data from the C-based GRASS internals into Python tensors.
Depending on the computational region and raster layout, the data can behave slightly differently once it’s converted. Because of this, I plan to prioritize the gunittest suite alongside the tiling engine, not after it. The goal is to catch edge cases early and make sure the data loader behaves consistently across different regions and datasets.
For the addon interface, I’ll follow the g.parser pattern used in modules such as r.slope.aspect, which provides a clear structure for building standard GRASS modules with CLI and GUI compatibility.
What I’m Actually Building
The minimum viable deliverable is two things: the grass.ml.temporal data loader API, and the cross-STRDS alignment utility.
If those two components land cleanly and pass the gunittest suite, the project is already successful regardless of stretch goals.
If the core infrastructure stabilizes early, I plan to extend it with:
-
ML–temporal bridge: streams 512×512 model-ready tensors from PyGRASS while respecting
g.region. Unliker.tile, this is a streaming DataLoader interface. -
Cross-sensor alignment module: pairs SAR maps from one STRDS with the nearest optical maps from another using the TGIS temporal topology engine.
-
Benchmarking suite: tests the pipeline on a Sentinel-1 / Sentinel-2 flood dataset using SSIM, MAE, and cloud-mask reconstruction metrics.
-
i.fusion.temporaladdon: exposes the pipeline as a standard GRASS module viag.parser. A lightweight U-Net will serve as a baseline, but the framework remains model-agnostic. -
gunittestsuite: validates the data loader and alignment modules to ensure stable behavior across regions and datasets.
You may Ask: Why GRASS and Not a Standalone Library
Well, I considered building this as a standalone PyTorch dataset class on top of GDAL. That would work technically, but it would lose several things that GRASS already solves well: TGIS temporal metadata, g.region CRS management, and compatibility with existing GRASS workflows..
So, the main reason for implementing this inside GRASS is that it complements and extends the existing ecosystem.
This ensures the reconstructed pseudo-optical layer works natively with GRASS tools like t.rast.algebra, r.series, and other temporal modules, instead of being stuck inside a standalone training script.
Coming from a segmentation background, that composability is exactly what I would want as a user of this tool.
Honest Assessment of Scope
350 hours is tight for everything listed here, and I know that. The reason I think it’s manageable is that the most critical parts: “the data loader and alignment utility” are relatively self contained.
If they take longer than expected, the stretch goals can scale down cleanly. The U-Net baseline becomes a minimal proof-of-concept, the benchmarking suite covers fewer scenarios, and the addon interface remains simpler, but the core infrastructure still delivers value to the GRASS ecosystem.
I came to this proposal through my own research, not just GSoC, which makes it genuinely exciting for me. If we land on this solution together, I’m motivated to explore further improvements and extensions even after the program ends, especially as new temporal satellite workflows evolve within GRASS.
P.S. I’ll write the full proposal with a detailed timeline, deliverable milestones, and revised scope once I hear back from you on whether the project direction works.