[Geoserver-devel] New community module: layer/data expiry

Hi,
I would like to propose a new module that would handle an expiry date for layers and
data within layers, so that when the time comes, the layer or the data gets dropped.

Two identified use cases at the moment:

  • Temporary layers created out of processing, that live for the purpose of making
    the processing results visible via WMS/WMTS, but which are normally not kept
    around in a stable way and are often non advertised
  • Moving window for stable layers with dimension support, where the data inside
    the layer needs to be purged so that too old data is removed

The expiry concept could be implemented in a number of ways, e.g., target expiry
date, timeout after last data usage, keep last N rows, but for this iteration we are
targetting simple expiry date as it’s the simplest mechanism, and with some
work some of the other ones can be reconducted to it.

For layer wide expiry, we would have an expiration date, and a flag that states
whether the date inside the layer also needs to be dropped.
This would in turn reflect into a dropSchema call for vector data, a simple file
delete for simple raster layers (plus eventual sidecar files), and a call
to StructuredCoverage2DReader.delete(…) for mosaics and netcdf files.
Speaking of which, it might make sense to push the delete(…) call from
the StructuredCoverage2DReader to GridCoverage2DReader…
I’ll inquire about this on gt-devel

For layers with dimensions, a CQL filter would identify which features/granules
to delete.
We are envisaging two operation modes, a fixed expiry mode, and a continous one.
In fixed expiry mode one specifies a target expiry date, and a CQL filter that
identifies the features/granules that need to be dropped at that date.
In continous expiry mode we’d have no expiry date, but a filter that embeds
the moving window within itslef, something like:

dateDiff(now(), creationDate, “days”) > 10

Now… .where to store this expiry information? Two options, an external dedicated
database, or embed inside the layers configuration itself, as part of the metadata
section.

We are leaning towards the second option, as it does not have consistency
or clustering issues that we’d have to handle with the first (and catalog
can be offloaded to database if it becomes to big).

So implementation wise we’d have something like this in the metadata for
a simple layer expiration:

... layer 20150228T100000 true ...

For multidimensional layers instead we’d have something like this for
group based expiration:

data

20150228T100000
myAttribute = “ABC”

20150228T200000
myAttribute = “EFG”

For continous data expiration instead we’d have something like:

data dateDiff(now(), creationDate, "days") > 10

Of course dateDiff and now would be new filter functions, and we’d teach selected databases
(postgresql for the moment) how to translate them down in sql for efficient execution.

The expiration enforcement code would then look for layers that have the expirationControl
object set, and apply the expiration rules accordingl (with a filter based search, not a in
memory programmatic one).

For large catalogs using jdbcConfig it would be nice to have the existance of expirationControl
be an indexed property so that we can quickly locate layers that have a certain expiry date
set, or that have data based expiry checks, to only grab those out of the large lot of
layers… which begs for an easy way to index a new property, maybe programmatically or
declaratively, in JDBCConfig, eventually against an already setup database… a functionality
that as far as I know is missing. But we’ll cross that bridge when we get there.

Cheers
Andrea

···

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Seems this email is longer than our usual proposal in the wiki :slight_smile:

The idea is sound and I like the direction of thinking about storage implementation and JDBCConfig from the start.

You may wish to call to module automate rather than expire so it has a chance to grow new functionality, such as a timed ingest.

On Thu, Feb 19, 2015 at 3:15 AM Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
I would like to propose a new module that would handle an expiry date for layers and
data within layers, so that when the time comes, the layer or the data gets dropped.

Two identified use cases at the moment:

  • Temporary layers created out of processing, that live for the purpose of making
    the processing results visible via WMS/WMTS, but which are normally not kept
    around in a stable way and are often non advertised
  • Moving window for stable layers with dimension support, where the data inside
    the layer needs to be purged so that too old data is removed

The expiry concept could be implemented in a number of ways, e.g., target expiry
date, timeout after last data usage, keep last N rows, but for this iteration we are
targetting simple expiry date as it’s the simplest mechanism, and with some
work some of the other ones can be reconducted to it.

For layer wide expiry, we would have an expiration date, and a flag that states
whether the date inside the layer also needs to be dropped.
This would in turn reflect into a dropSchema call for vector data, a simple file
delete for simple raster layers (plus eventual sidecar files), and a call
to StructuredCoverage2DReader.delete(…) for mosaics and netcdf files.
Speaking of which, it might make sense to push the delete(…) call from
the StructuredCoverage2DReader to GridCoverage2DReader…
I’ll inquire about this on gt-devel

For layers with dimensions, a CQL filter would identify which features/granules
to delete.
We are envisaging two operation modes, a fixed expiry mode, and a continous one.
In fixed expiry mode one specifies a target expiry date, and a CQL filter that
identifies the features/granules that need to be dropped at that date.
In continous expiry mode we’d have no expiry date, but a filter that embeds
the moving window within itslef, something like:

dateDiff(now(), creationDate, “days”) > 10

Now… .where to store this expiry information? Two options, an external dedicated
database, or embed inside the layers configuration itself, as part of the metadata
section.

We are leaning towards the second option, as it does not have consistency
or clustering issues that we’d have to handle with the first (and catalog
can be offloaded to database if it becomes to big).

So implementation wise we’d have something like this in the metadata for
a simple layer expiration:

... layer 20150228T100000 true ...

For multidimensional layers instead we’d have something like this for
group based expiration:

data

20150228T100000
myAttribute = “ABC”

20150228T200000
myAttribute = “EFG”

For continous data expiration instead we’d have something like:

data dateDiff(now(), creationDate, "days") > 10

Of course dateDiff and now would be new filter functions, and we’d teach selected databases
(postgresql for the moment) how to translate them down in sql for efficient execution.

The expiration enforcement code would then look for layers that have the expirationControl
object set, and apply the expiration rules accordingl (with a filter based search, not a in
memory programmatic one).

For large catalogs using jdbcConfig it would be nice to have the existance of expirationControl
be an indexed property so that we can quickly locate layers that have a certain expiry date
set, or that have data based expiry checks, to only grab those out of the large lot of
layers… which begs for an easy way to index a new property, maybe programmatically or
declaratively, in JDBCConfig, eventually against an already setup database… a functionality
that as far as I know is missing. But we’ll cross that bridge when we get there.

Cheers
Andrea

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.



Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@anonymised.comsourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Thu, Feb 19, 2015 at 4:31 PM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

Seems this email is longer than our usual proposal in the wiki :slight_smile:

Was meant to gather feedback, as community modules do not need a formal
proposal :-p

The idea is sound and I like the direction of thinking about storage
implementation and JDBCConfig from the start.

You may wish to call to module automate rather than expire so it has a
chance to grow new functionality, such as a timed ingest.

Yeah, I like the direction. If we are thinking about automation, should we
host the <expirationControl> under a <automate>
umbrella base tag?
Uh, maybe not... I guess most automations would be independent of each
other (another idea could be to periodically update
the bbox of certain layers)

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

I second Jody's feedback since this could open the path for attaching
automation to existing layers to perform timed ingest or event based
ingest like when monitoring a filesystem.
In addition we might want think about leaving room for attaching
actions/automation to REST operations.
We also quickly discussed about ways to reuse/build upon what the
importer is bringing in in terms of transformation and the like as it
would be bad to end up with deplicated functionalities.

One thing to consider carefully is how to handles these events in a
clustered installation as we don't want to end up with trying to
delete or update things twice.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Thu, Feb 19, 2015 at 4:31 PM, Jody Garnett <jody.garnett@anonymised.com> wrote:

Seems this email is longer than our usual proposal in the wiki :slight_smile:

The idea is sound and I like the direction of thinking about storage
implementation and JDBCConfig from the start.

You may wish to call to module automate rather than expire so it has a
chance to grow new functionality, such as a timed ingest.
On Thu, Feb 19, 2015 at 3:15 AM Andrea Aime <andrea.aime@anonymised.com>
wrote:

Hi,
I would like to propose a new module that would handle an expiry date for
layers and
data within layers, so that when the time comes, the layer or the data
gets dropped.

Two identified use cases at the moment:
* Temporary layers created out of processing, that live for the purpose of
making
  the processing results visible via WMS/WMTS, but which are normally not
kept
  around in a stable way and are often non advertised
* Moving window for stable layers with dimension support, where the data
inside
  the layer needs to be purged so that too old data is removed

The expiry concept could be implemented in a number of ways, e.g., target
expiry
date, timeout after last data usage, keep last N rows, but for this
iteration we are
targetting simple expiry date as it's the simplest mechanism, and with
some
work some of the other ones can be reconducted to it.

For layer wide expiry, we would have an expiration date, and a flag that
states
whether the date inside the layer also needs to be dropped.
This would in turn reflect into a dropSchema call for vector data, a
simple file
delete for simple raster layers (plus eventual sidecar files), and a call
to StructuredCoverage2DReader.delete(...) for mosaics and netcdf files.
Speaking of which, it might make sense to push the delete(...) call from
the StructuredCoverage2DReader to GridCoverage2DReader...
I'll inquire about this on gt-devel

For layers with dimensions, a CQL filter would identify which
features/granules
to delete.
We are envisaging two operation modes, a fixed expiry mode, and a
continous one.
In fixed expiry mode one specifies a target expiry date, and a CQL filter
that
identifies the features/granules that need to be dropped at that date.
In continous expiry mode we'd have no expiry date, but a filter that
embeds
the moving window within itslef, something like:

dateDiff(now(), creationDate, "days") > 10

Now.. .where to store this expiry information? Two options, an external
dedicated
database, or embed inside the layers configuration itself, as part of the
metadata
section.

We are leaning towards the second option, as it does not have consistency
or clustering issues that we'd have to handle with the first (and catalog
can be offloaded to database if it becomes to big).

So implementation wise we'd have something like this in the metadata for
a simple layer expiration:

<metadata>
   ...
   <expirationControl>
        <mode>layer</mode>
       <expiration>20150228T100000</expiration>
       <dropData>true</dropData>
   </expirationControl>
   ...
</metadata>

For multidimensional layers instead we'd have something like this for
group based expiration:

   <expirationControl>
        <mode>data</mode>
       <dropGroup>
            <expiration>20150228T100000</expiration>
            <filter>myAttribute = "ABC"</filter>
       </dropGroup>
       <dropGroup>
            <expiration>20150228T200000</expiration>
            <filter>myAttribute = "EFG"</filter>
       </dropGroup>
   </expirationControl>

For continous data expiration instead we'd have something like:

   <expirationControl>
       <mode>data</mode>
      <filter>dateDiff(now(), creationDate, "days") > 10</filter>
   </expirationControl>

Of course dateDiff and now would be new filter functions, and we'd teach
selected databases
(postgresql for the moment) how to translate them down in sql for
efficient execution.

The expiration enforcement code would then look for layers that have the
expirationControl
object set, and apply the expiration rules accordingl (with a filter based
search, not a in
memory programmatic one).

For large catalogs using jdbcConfig it would be nice to have the existance
of expirationControl
be an indexed property so that we can quickly locate layers that have a
certain expiry date
set, or that have data based expiry checks, to only grab those out of the
large lot of
layers... which begs for an easy way to index a new property, maybe
programmatically or
declaratively, in JDBCConfig, eventually against an already setup
database.... a functionality
that as far as I know is missing. But we'll cross that bridge when we get
there.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro
utilizzo è consentito esclusivamente al destinatario del messaggio, per le
finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio
senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia
via e-mail e di procedere alla distruzione del messaggio stesso,
cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo
anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per
finalità diverse, costituisce comportamento contrario ai principi dettati
dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender does
not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE

http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel