It seems impossible to doubt that everything in
the universe can be represented by numbers [...]
-- N. I. Lobachevsky
Reading ``Time series in GRASS'' page [1], as well as [2, 3],
made me wonder, is time the only parameter one may need to lay
the data sets along of? Arguably, it's not.
Consider, for example, one having to compare the behaviour of
MM5 modelling results with different models or parameters used.
There, related rasters are laid along the model index or model
parameter's value.
Another example are the rasters comprising of the values of a
meterological variable for certain (often non-uniformly spaced)
values of pressure. These sets of raster data sets shouldn't be
turned into 3D rasters, since the pressure to height
correspondence varies over space and time.
The above makes me believe that the generic facility to keep the
relations between the rasters is necessary. Besides,
implementing this facility allows for several other problems to
be addressed within its framework, as I'd try to show below.
* Several related rasters: a rasterset?
Both of the examples above suggested using numeric values to
represent the relationship between the rasters. These values
can include:
* timestamp (in seconds since epoch), allowing for time series
[1];
* layer's pressure;
* model index or model parameter;
* were quality flags applied to the raster (1) or not (0)?
Let me define rasterset as a named collection of related
rasters, each unambiguously identified by an arbitrary number of
the arbitrary numeric values.
Below, I assume using 2D rasters at the lower level of the
rasterset implementation, since 3D rasters could easily be
simulated by a rasterset with a `z' as the parameter.
* Tiled raster storage
The most simple case of using the rasterset facility is to
implement tiled raster storage [3].
Indeed, a tiled raster could be implemented with each tile
becoming a raster within a single rasterset, and then being
assigned a pair of numeric parameters -- the indices of the
tile.
Since the spatial resolution of the tile may differ (the rasters
comprising the dataset are almost as independent as the
individual rasters in GRASS currently), this allows for both the
whole-NULL tiles (no raster for this tile indices), and for the
same-value tiles (1x1 raster covering the whole region.)
For the usage of this feature is supposed to be quite common, I
believe it needs to be implemented at the ``core'' of the
rasterset implementation, with the appropriate optimizations
applied for some common cases.
* Metadata
Since the rasters comprising the rasterset are allowed to carry
an arbitrary number of additional numeric parameters, this
facility could assume handling of certain (though not arbitrary)
metadata, even in cases where these additional parameters aren't
strictly necessary for the identification purposes.
However, with each raster being assigned a category, it's
possible to associate arbitrary information with it using a
database connected to the rasterset.
* Color maps
Color maps are currently tied rather closely to the rasters they
are used for, making it hardly practical to use different color
maps for the same rasters. This feature could be used, for
example, to apply different color maps when displaying the data
and producing the printed output.
Would the color maps be detached from the rasters, it may become
feasible to allow for a color map to be shared among several
rasters.
I've already mentioned a raster's parameter as a possible
substitute for `z' (both for simulating `z' for ordinary 3D
rasters, and for storing layers of data for which layer index to
`z' mapping varies over space and time.) Moreover, for digital
elevation models `z' value is actually the value stored in
raster. It may be worth investigated whether this relation
could be turned inside out, to allow for arbitrary value to
arbitrary value mappings be stored as 2D (or 1D) rasters within
a rasterset.
There may be demand for storing quite arbitrary arrays in the
future as well.
* Scanning radiometers & Time
Due to the curvature of the Earth surface, a satellite scanning
radiometer such as MODIS sees certain places on Earth multiple
times in a short period of time (about 1.48 s for MODIS.)
These places appear on consequent scans on L2 data. The most
common practices to deal with this effect are either to average
the values obtained for the same place, or to take the one value
that is, after some criterion, superior to the other.
However, allowing for the scans to be stored independently along
with a ``time'' value associated with each would allow one to
analyze these very short-term changes (if any.)
* RDBMS as the backend
Probably the most appealing feature of the rasterset model is
its supposed flexibility. As mentioned above, the color maps
could be represented as the rasters in their very own coordinate
space, and so could be the ground control points (*).
With the number of separate data structures to form a raster
reduced, it could become feasible to put these structures into a
general purpose RDBMS system, thus partially addressing both the
disk space and the large number of files in a directory concerns
[4].
(*) It's very common for the satellite Level 2 data to specify
the latitudes and longitudes for the centres of the pixels as
the separate rasters. These could be mapped directly to the
specific rasters within the rasterset. See [5] for a related
feature in GDAL.
* Views
The names aren't convenient for rasters. For example, I have a
location full of rasters with the names like:
2007-05-31-grans-std-qual-o3
2007-05-31-grans-std-toto3std
2007-05-31-grans-std-toto3std.qa
2007-05-31-grans-std-toto3stderr
...
The total number of the 2D data sets for each day is over 70,
most of which come in both the ``no quality flags applied'' form
(without the `.qa' suffix) and the ``standard quality flags
applied'' one (with one.) And the source data do include even
more data sets.
In order to handle this amount of data efficiently the system
should allow one to limit the namespace to the data sets
matching arbitrary criterions. I don't consider the GUI
specifically, since it may become rather tedious to filter the
g.mlist(1) output with grep(1) in scripts as well.
The rasterset model seems to be a more appropriate solution.
And, as suggested in the next section, there could be a way to
name a specific raster within the rasterset. With this
functionality available from scripts, one could easily apply
arbitrary schemes for naming the data sets.
* User interface
With the rastersets being implemented, GRASS database becomes to
look much more like a relational one. Since the individual
rasters are no longer named individually (rather, they share the
common rasterset name and are identified by the associated
values of the arbitrary parameters), to access a specific raster
one would need to issue a query. (Much like accessing a table's
row with SQL queries.)
Certainly, to expose the very exciting new features the
rasterset model could offer, the UI (both the command line and
the graphical parts) would require a major overhaul. However,
for the compatibility's sake, it's reasonable to implement the
current raster accessing interface on top of the rasterset
facility, thus allowing for the existing code (and therefore
interface) to be retained.
Then, there would have to be a mapping of the compatibility
raster names to the (rastername, parameters) pairs, and the
corresponding utilities to manage it, both in the library API
and the UI, like:
GRASS> r.bind \
raster=compat-airs-2007-05-31-total-ozone.qa \
rasterset=airs-total-ozone \
parameter="timestamp=2007-05-31 21:35:24 +0000" \
parameter="qaflags_p=true"
Parameters not specified are to be allowed to match all, but not
any, of the rasters. Thus, it won't be needed to specify the
individual tile indices for the tiled rasters to mean the whole
spatial extent of the rasterset. If several rasters match the
specification, but do not complement each other spatially, an
error is signalled, like:
GRASS> r.bind \
raster=dummy \
rasterset=airs-total-ozone \
parameter="qaflags_p=true"
r.bind: several rasters match the specification
GRASS>
Importing utilities (r.in.gdal, or r.import) would need to be
changed early to allow for both the rasterset name and the
identifying parameters to be specified. The other modules could
be changed as the time permits.
The rasters imported may be automatically named according to an
arbitrary user-specified scheme with the ``hooks'' facility
being implemented in GRASS. (I hope to present my ideas
regarding such a facility in a separate posting.)
* Notes for the implementor
The model described above could be based on the current 2D
rasters implementation after cleaning it of the extra features
to be provided by the rasterset model itself.
Within the model, the 2D rasters facility is the lower level,
and its interface would need to be changed. For the
compatibility's sake, the former interface would need to be
provided by the code layered on top of the rasterset
implementation.
The rasterset model is to be implemented mostly from scratch.
[1] http://grass.gdf-hannover.de/wiki/Time_series_in_GRASS
[2] http://grass.gdf-hannover.de/wiki/GRASS_7_ideas_collection
[3] http://grass.gdf-hannover.de/wiki/Replacement_raster_format
[4] http://freegis.org/cgi-bin/viewcvs.cgi/grass/gips/gip-0002.txt?rev=HEAD&co
ntent-type=text/vnd.viewcvs-markup
[5] http://trac.osgeo.org/gdal/wiki/rfc4_geolocate