This is a progress update on the STRDS topic, proposed by Stephan Blumentrath aka ninsbl.
Regarding issue #3394 - which concerns how t.register handles timestamps in the filesystem, …when registering a vector map from another mapset, a timestamp is written in the file database and a table is written to SQLite.
I traced the issue down to one file: python/grass/temporal/abstract_map_dataset.py
The function write_timestamp_to_grass() is the prime culprit responsible for writing timestamps.
Initially there was an ambiguity regarding which method actually writes the timestamps, since there are two files that define the same function: write_timestamp_to_grass()
abstract_map_dataset.py → defines it as an abstract method
space_map_datasets.py → actual definition of write_timestamp_to_grass()
From examining the class hierarchy, my understanding is that abstract_map_dataset.py handles generic map behaviour (raster/vector etc.) … while space_map_datasets.py handles datasets that contain many maps (STRDS/STVDS).
Based on this, my current thought is that the logic controlling when timestamps are written might need to be adjusted around update_absolute_time() in abstract_map_dataset.py .
I wanted to know if my line of thought is correct, and would request guidance regarding the same.
I did a little more reading, and learned how the conditions are handled in temporal. I went with my initial hypothesis - about changing the logic behind when timestamps are written.
The function write_timestamp_to_grass() was responsible for writing timestamps, governed by a boolean function get_enable_timestamp_write(). By default, that was always true, which prevented the code from even checking if the map belonged to the mapset or not.
Following up on my earlier investigations in this thread, I’ve drafted my proposal and would really appreciate your feedback on it. @Sefan, I’d especially value your thoughts, given our earlier discussions.
Please let me know if there are any areas that need improvement or refinement.
do you have a working document I could comment in directly, without sending PDFs forth and back?
Some general comments:
It is important to maintain compatibility with both SQLite and PostgrSQL which are the two DBMS supported in TGIS.
There are two thinkable approaches: One would be to add just one column to the TGIS templates to store metadata in JSON, or to add a column for each metadata field. The latter could be achieved if a metadata model is provided by the user when an STDS is created. A metadata model would also be needed to ensure consistency of data in a JSON column… Here flexibility, performance and consistency are important criteria.
CF- or STAC could provide default or example metadata models, but we should assume user-specific metadata as well…
That said, the default temporal_where option is mainly a standard SQL where-clause (with syntax depending on the DBMS). The where clause is passed to the TGIS database backend. That`s it.. So, I guess here documentation / examples are sufficient, no development needed.
I added a note to the related github issue about synchronizing file based metadata and metadata in TGIS. You don not hve to solve this, but please be aware and clear about limitations and what is out of scope for your project.
If you could sketch how you would envision that a user would create a STDS with custom metadata and register maps in it, that would be very helpful…
I’ve made some updates in the proposal, as you asked.
I’ve included a sketch of how the user should be able to create a STDS with custom metadata…
I’ve also clarified that metadata is purely optional, and users may not choose to use it. If they do, there are three cases:
Case A: User provides metadata model during dataset creation using t.create - this model is used to validate metadata provided during map registration
Case B: User does not provide metadata at all - in this case, maps are registered normally and metadata is stored as NULL. Metadata is ignored during querying.
Case C: No model is provided, but metadata is provided during t.register - metadata is stored in an unstructured JSON (inspired by STAC and CF) … and still used during querying.
Also— your point about the second direction in storing metadata… create a separate column for each metadata field… is, more efficient, but also very rigid. I mean… it’s a little tough for me to imagine how users would provide a model for that via CLI every time. We would have to touch schema, I think maybe it wouldn’t be very flexible — not challenging your idea, just admitting I can’t see a clear direction in that.
My line of thought is— JSON is more flexible,… users may provide their own metadata model, but that’s expandable if they’re in JSON. I’d love to know your thoughts here.
I’d appreciate if you could review it once more, and let me know if there’s anything else to be improved/refined.
Columns, or even entire tables for extended metadata could be created when a STDS is created. That may be fixed for all STDS in a mapset, or just for a specific STDS (both thinkable in theory). metadata provided per map (in JSON) can then be mapped to columns (assuming JSON matches the data model in the tables). If you have hundret-thousands of maps in a STDS it is probably not advisable to have many of them in one mapset, so thet a limitation of “one extended metadata model per mapset” may be acceptable… That is stuff for the community bonding periode I guess…
Important to make a good, scalable plan, and to (collectively) think about possible implementations and their implications…
I see what you’re hinting at.
The idea of defining columns and dedicated tables for extended metadata at STDS creation time makes sense… especially if a consistent metadata model is known beforehand. I can imagine how that would make querying more efficient ----compared to a purely JSON based approach.
At the same time, im wondering if this could be combined with the JSON approach i proposed ---- as in we start with JSON as the base-layer… let users store any metadata.
Then we identify the important fields … that are commonly use— we observe what fields they frequently use… find patterns… and THEN add them as a column.
Kind of a hybrid architecture.
We definitely need to discuss this more in greater detail. I’m just giving ideas.
In the meantime- i’ve updated the proposal according to your comments.
If you could review it for a final time, it would be great–