[GSOC 2026] Searchable metadata in space time datasets

Dasux · March 6, 2026, 11:20am

Hello Community,

This is a progress update on the STRDS topic, proposed by Stephan Blumentrath aka ninsbl.

Regarding issue #3394 - which concerns how t.register handles timestamps in the filesystem, …when registering a vector map from another mapset, a timestamp is written in the file database and a table is written to SQLite.

I traced the issue down to one file: python/grass/temporal/abstract_map_dataset.py
The function write_timestamp_to_grass() is the prime culprit responsible for writing timestamps.

Initially there was an ambiguity regarding which method actually writes the timestamps, since there are two files that define the same function: write_timestamp_to_grass()

abstract_map_dataset.py → defines it as an abstract method
space_map_datasets.py → actual definition of write_timestamp_to_grass()

From examining the class hierarchy, my understanding is that abstract_map_dataset.py handles generic map behaviour (raster/vector etc.) … while space_map_datasets.py handles datasets that contain many maps (STRDS/STVDS).

Based on this, my current thought is that the logic controlling when timestamps are written might need to be adjusted around update_absolute_time() in abstract_map_dataset.py .

I wanted to know if my line of thought is correct, and would request guidance regarding the same.

Rgds,
Anirban

Dasux · March 22, 2026, 6:35pm

Hi @Sefan

This is a follow-up from my last post.

I did a little more reading, and learned how the conditions are handled in temporal. I went with my initial hypothesis - about changing the logic behind when timestamps are written.

The function write_timestamp_to_grass() was responsible for writing timestamps, governed by a boolean function get_enable_timestamp_write(). By default, that was always true, which prevented the code from even checking if the map belonged to the mapset or not.

github.com/OSGeo/grass

Fix 3394 cross mapset timestamp (#7211)

main ← Dasux:fix-3394-cross-mapset-timestamp

opened 06:21PM - 22 Mar 26 UTC

Dasux

+9 -1

This PR addresses issue #3394 The problem was when registering a map like `la…kes@PERMANENT` , the temporal framework attempts to write timestamp data into the current mapset. This created an invalid map entry e.g, `lakes@to_verify` without the actual vector data. <img width="571" height="504" alt="image" src="https://github.com/user-attachments/assets/b2fba759-eb62-41ef-9166-957667730f8d" /> The reason was ... the function `write_timestamp_to_grass()` is called without verifying that the map belongs to the current mapset. ### Fix made: Added an additional condition in the function `update_absolute_time()`... to verify if map is in its own mapset or not. ``` if ( get_enable_timestamp_write() and self.get_mapset() == get_current_mapset() ): self.write_timestamp_to_grass() else: self.msgr.warning( f"Skipping timestamp write for {self.get_map_id()} (different mapset)" ) ``` ### Additional note: I found multiple instances of `write_timestamp_to_grass` ... I've added the condition only in one of the methods... I wanted to confirm if this was the intended change required before proceeding with the other methods... <img width="816" height="193" alt="image" src="https://github.com/user-attachments/assets/9035fcb6-f097-462a-a760-b4882a5edc0d" />

Dasux · March 29, 2026, 12:01pm

Hello mentors,

Following up on my earlier investigations in this thread, I’ve drafted my proposal and would really appreciate your feedback on it.
@Sefan, I’d especially value your thoughts, given our earlier discussions.

Please let me know if there are any areas that need improvement or refinement.

GSoC Proposal - 2_ (2).pdf (145.2 KB)

Thanks and Regards,
Anirban.

Sefan · March 29, 2026, 10:07pm

Hi Ariban,

do you have a working document I could comment in directly, without sending PDFs forth and back?

Some general comments:

It is important to maintain compatibility with both SQLite and PostgrSQL which are the two DBMS supported in TGIS.

There are two thinkable approaches: One would be to add just one column to the TGIS templates to store metadata in JSON, or to add a column for each metadata field. The latter could be achieved if a metadata model is provided by the user when an STDS is created. A metadata model would also be needed to ensure consistency of data in a JSON column… Here flexibility, performance and consistency are important criteria.
CF- or STAC could provide default or example metadata models, but we should assume user-specific metadata as well…

That said, the default temporal_where option is mainly a standard SQL where-clause (with syntax depending on the DBMS). The where clause is passed to the TGIS database backend. That`s it.. So, I guess here documentation / examples are sufficient, no development needed.

I added a note to the related github issue about synchronizing file based metadata and metadata in TGIS. You don not hve to solve this, but please be aware and clear about limitations and what is out of scope for your project.

If you could sketch how you would envision that a user would create a STDS with custom metadata and register maps in it, that would be very helpful…

Cheers,

Stefan

Dasux · March 30, 2026, 1:45am

Good Morning @Sefan

Thank you for the direction and feedback.
Here is the link to a working document…

I just started going thru your comments, and will take some time to understand and incorporate them.

I’ll follow up with a revised version soon.

Best,
Anirban.

Dasux · March 30, 2026, 12:34pm

Hello @Sefan

I’ve made some updates in the proposal, as you asked.
I’ve included a sketch of how the user should be able to create a STDS with custom metadata…

I’ve also clarified that metadata is purely optional, and users may not choose to use it. If they do, there are three cases:

Case A: User provides metadata model during dataset creation using t.create - this model is used to validate metadata provided during map registration

Case B: User does not provide metadata at all - in this case, maps are registered normally and metadata is stored as NULL. Metadata is ignored during querying.

Case C: No model is provided, but metadata is provided during t.register - metadata is stored in an unstructured JSON (inspired by STAC and CF) … and still used during querying.

Also— your point about the second direction in storing metadata… create a separate column for each metadata field… is, more efficient, but also very rigid. I mean… it’s a little tough for me to imagine how users would provide a model for that via CLI every time. We would have to touch schema, I think maybe it wouldn’t be very flexible — not challenging your idea, just admitting I can’t see a clear direction in that.

My line of thought is— JSON is more flexible,… users may provide their own metadata model, but that’s expandable if they’re in JSON. I’d love to know your thoughts here.

I’d appreciate if you could review it once more, and let me know if there’s anything else to be improved/refined.

Thanks,
Anirban.

Sefan · March 30, 2026, 9:22pm

Hi, anirban, and thnks for the updates.

Columns, or even entire tables for extended metadata could be created when a STDS is created. That may be fixed for all STDS in a mapset, or just for a specific STDS (both thinkable in theory). metadata provided per map (in JSON) can then be mapped to columns (assuming JSON matches the data model in the tables). If you have hundret-thousands of maps in a STDS it is probably not advisable to have many of them in one mapset, so thet a limitation of “one extended metadata model per mapset” may be acceptable… That is stuff for the community bonding periode I guess…

Important to make a good, scalable plan, and to (collectively) think about possible implementations and their implications…

Cheers

Stefan

Dasux · March 31, 2026, 1:51pm

Hello @Sefan … thanks for the clarification!

I see what you’re hinting at.
The idea of defining columns and dedicated tables for extended metadata at STDS creation time makes sense… especially if a consistent metadata model is known beforehand. I can imagine how that would make querying more efficient ----compared to a purely JSON based approach.

At the same time, im wondering if this could be combined with the JSON approach i proposed ---- as in we start with JSON as the base-layer… let users store any metadata.

Then we identify the important fields … that are commonly use— we observe what fields they frequently use… find patterns… and THEN add them as a column.
Kind of a hybrid architecture.

We definitely need to discuss this more in greater detail. I’m just giving ideas.

In the meantime- i’ve updated the proposal according to your comments.
If you could review it for a final time, it would be great–

Thanks again,
Anirban.

Dasux · April 27, 2026, 5:12pm

Hello @Sefan

This is a follow up post, on my preliminary investigation of the TGIS pipeline.

I’ve mapped out the pipeline that data follows when a CLI command like t.create or t.register is invoked by the user.

→ CLI
→ t.create.py → open_new_stds() : this function is responsible for creating new STDS
→ dataset object (sp) is created…
→ sp.set_initial_values() fills object fields
→ sp.insert() calls AbstractDataset.insert()

calls to:
→ base.serialize()
→ temporal_extent.serialize()
→ spatial_extent.serialize()
→ metadata.serialize()

these generate the SQL INSERT statements, and are executed via dbif.execute_transaction()

From this, my understanding is that the dataset insertion in TGIS is performed by composing multiple SQL INSERT statements corresponding to different components (base, temporal, spatial, metadata_… which are then executed as a single transaction. The underlying schema is defined separately via SQL templates in lib/temporal/SQL.

Could you please confirm if this understanding is accurate?

If we wanted to add user defined metadata fields as you mentioned earlier, we could probably do that in the dataset creation phase. One possible approach would be right after sp.set_initial_values() step. We would have to define/initialize the extended metadata schema here (via SQL templates), and then proceed with sp.insert()…

Glad to know your thoughts.

Thanks,
Anirban.

Sefan · April 28, 2026, 7:18am

Hei Anirban,

yes, that asserssment seems right.Please note also tgis.init() (temporal package — GRASS Python Library Documentation). That function actually sets up a bunch of environment variables and creates the initial TGIS database and tables there in (if it does not exist).That said, in principle, I think your suggestion where user-specific, extended metadata could or should be added seems the right place.

Cheers

Stefan

Dasux · April 29, 2026, 4:18am

Good morning @Sefan

Thank you for the hint about tgis.init() — i’ve seen this in the codebase a lot, never really dug into what it’s doing, apart from what the comments said. This helps clarify how initial DB and schema are set up.

I’ll take some time to think through a concrete design for this. I might be a little slower to respond over the next week due to exams, but I’ll stay on it.

Thanks,
Anirban.

Dasux · April 30, 2026, 8:28pm

I guess this was rejected.

But thank you for your time and consideration.
I’ll still stick around anyway. I’ve learn’t quite a lot in the last few months, and maybe i could contribute outside GSoC tooo.

Rgds,
Anirban.