Hi,
I would like to propose a new module that would handle an expiry date for layers and
data within layers, so that when the time comes, the layer or the data gets dropped.
Two identified use cases at the moment:
- Temporary layers created out of processing, that live for the purpose of making
the processing results visible via WMS/WMTS, but which are normally not kept
around in a stable way and are often non advertised - Moving window for stable layers with dimension support, where the data inside
the layer needs to be purged so that too old data is removed
The expiry concept could be implemented in a number of ways, e.g., target expiry
date, timeout after last data usage, keep last N rows, but for this iteration we are
targetting simple expiry date as it’s the simplest mechanism, and with some
work some of the other ones can be reconducted to it.
For layer wide expiry, we would have an expiration date, and a flag that states
whether the date inside the layer also needs to be dropped.
This would in turn reflect into a dropSchema call for vector data, a simple file
delete for simple raster layers (plus eventual sidecar files), and a call
to StructuredCoverage2DReader.delete(…) for mosaics and netcdf files.
Speaking of which, it might make sense to push the delete(…) call from
the StructuredCoverage2DReader to GridCoverage2DReader…
I’ll inquire about this on gt-devel
For layers with dimensions, a CQL filter would identify which features/granules
to delete.
We are envisaging two operation modes, a fixed expiry mode, and a continous one.
In fixed expiry mode one specifies a target expiry date, and a CQL filter that
identifies the features/granules that need to be dropped at that date.
In continous expiry mode we’d have no expiry date, but a filter that embeds
the moving window within itslef, something like:
dateDiff(now(), creationDate, “days”) > 10
Now… .where to store this expiry information? Two options, an external dedicated
database, or embed inside the layers configuration itself, as part of the metadata
section.
We are leaning towards the second option, as it does not have consistency
or clustering issues that we’d have to handle with the first (and catalog
can be offloaded to database if it becomes to big).
So implementation wise we’d have something like this in the metadata for
a simple layer expiration:
For multidimensional layers instead we’d have something like this for
group based expiration:
20150228T100000
myAttribute = “ABC”
20150228T200000
myAttribute = “EFG”
For continous data expiration instead we’d have something like:
data dateDiff(now(), creationDate, "days") > 10Of course dateDiff and now would be new filter functions, and we’d teach selected databases
(postgresql for the moment) how to translate them down in sql for efficient execution.
The expiration enforcement code would then look for layers that have the expirationControl
object set, and apply the expiration rules accordingl (with a filter based search, not a in
memory programmatic one).
For large catalogs using jdbcConfig it would be nice to have the existance of expirationControl
be an indexed property so that we can quickly locate layers that have a certain expiry date
set, or that have data based expiry checks, to only grab those out of the large lot of
layers… which begs for an easy way to index a new property, maybe programmatically or
declaratively, in JDBCConfig, eventually against an already setup database… a functionality
that as far as I know is missing. But we’ll cross that bridge when we get there.
Cheers
Andrea
···
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.
==
Ing. Andrea Aime
@geowolf
Technical Lead
GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549
http://www.geo-solutions.it
http://twitter.com/geosolutions_it
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.
The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.