Hi all,
I would like to submit the community with three ideas that I'd like
to implement, one in the short term, the other possibly in the
short term as well, and one for the future.
As you may know I've been working to revitalize WPS enough that may
become an extension in GS 2.1 (at least, that's the plan).
One of that main attractions of a WPS in GeoServer is that the WPS
is not stand alone, but it has at its disposal local services and
a catalog.
So far that means a WPS process does not have to painfully gather
data from remote, but it can also get it directly from the local
catalog. This is great, but it's one way, the outputs are still
going out in some form (gml, shapefiles, json)
that the client has to process by itself.
I want to integrated back in the other direction by having an "import"
process that can be used at the end of a processing chain to save back
the results into the catalog, so that the result can then be rendered
by WMS and queried by WFS. Which makes it possible to interact with
GS WPS with lightweight clients without the limitation of using small
data sets (not to mention the fact that the result layer can be
a legitimate new layer to be used long term).
For vectors the import process would take:
- the feature collection to be stored
- a layer name
- the workspace (optional, we can used the default)
- the target store (optional, on trunk we have a concept of default
store). I'd say the target store must exist (and be either a DB, or a
directory store)
- a style name (optional, we can use one of the built ins)
It's evident there is some overlap with restconfig, but a processing
chain will result in a feature collection, something we cannot
throw at REST (plus we don't want the data to travel back to the client
and then again to the server).
This would be a special case, I don't intend to actually go and redo
RESTConfig as a set of WPS processes (btw, if you have ideas of how
to integrate the two without having the data go round the world I'm
all ears).
At most it could be useful to add a RemoveLayer process that would remove the layer and the underlying contents from the catalog, so
that a client can actually do the two most common things without having
to switch protocols (add a layer, remove a layer).
Oh, the process would actually run only if a admin level user is invoking it (yeah, would be nice to have more granular administration
rights, but that's a can of worms I don't intend to open in my spare time)
So ok, this would be step one, and something I'd definitely like to do
this week.
For step two, let's consider not all processed layers are meant to live for a long time. You do some processing, _look_ at the results, decide
some of the filtering, buffering distances, or stuff like that, is not
ok, and want to redo the process with different params, and look at the
results again.
Of course this can be done by using the above Import process, but reality is, you probably don't want to:
a) have the layer be visible to the world
b) maybe you don't want to have to manage its lifecycle, the layer is
meant to be a throwaway anyways
So it would be nice to mark a layer as temporary and as private somehow.
Temporary means the layer (and the data backing it) would disappear in thin air after some time since it has been last used, private could either mean:
- it would not be advertised in the caps (but anyone knowing its full
name could access it)
- it would be protected using security so that only a certain user can
access it
I would go for the first, since the second implies working on the
granular security can of worms.
Also, adding a handling of temp layers sounds relatively easy to
implement, a little touch in the capabilities transformers, a scheduled
activity that periodically checks when the layer has last been accessed, and it's done. Perfect for spare time coding (whilst more complex
solutions still can get it using funding, when and if there is some).
Step three is daydreaming. But let me dream for once. Say I have a process that generates a layer. It does in a way that the layer is
cached, but dynamic: it is computed, but the process used to compute it
is saved, and it's run every time the input data changes (well, maybe
driven by a certain polling so that a storm of changes does not result
in a storm of processing routines running).
Actually this should not be _so_ hard to implement. Add to the layer
definition three new entries in the metadata section:
- the full process definition (as xml)
- the last reprocessing date
- the recompute interval
- the last date a input was changed
Then add a scheduled job that:
- knows about all the dynamic layers
- knows about the sources and has a transaction listener on them
- runs the processes again when the output is stale and uses transactions to change in a single shot the data under the layer feets.
How does this sound? I think temporary/private layers are more important
than this (my dream is to have one day a GeoServer based in-browser client that can behave similar to a desktop gis one day).
On the other side it seems the latter is doable without API changes,
which makes it a low hanging fruit.
Opinions and comments.... very welcomed!
Cheers
Andrea