[Geoserver-devel] Feature Versioning in/for Geoserver (plus much much more)

I've quite interested in getting Geoserver doing FeatureVersioning so
that when we (TOPP) start doing
Geo-Wiki/GeoCollaborator/Public-Participation GIS stuff that can track
and rollback changes, and "view" the dataset at a particular
point-in-time.

Personally, I think this (in itself) will be extremely powerful.

The design is a bit tricky - but I think the actual versioning code will
not be too difficult.

There are a few meta-design issues that need to be resolved first. I've
indirectly talked to a few people about this, but I, unfortunately,
haven't had time to put more than cursory thought into it.

In the email, below, I argue for a system (probably built using geotools
& XSLT) that can "extend" the functionality of already existing
services. All these services could be directly added to Geoserver or
Geotools, but by building it in this system I feel its (1) easier and
(2) allows more cross-project collaborations.

I feel the Java GIS stuff is totally separated from the non-Java GIS
stuff, and this makes us "marginalized". Tyler recently summarized the
major open-source mapping projects and didn't mention a single Java
based one. Since the system I describe below can be used to give
extra functionality to ALL systems that implements the OGC services
(ie. mapserver and - shock - ARCIMS) I believe we'll have some common
ground to share.

I'm hoping that we'll be able to entice people who want to continue
using Mapserver (or whatever they're currently using) to know & use the
java project, and also so that people using the non-java projects could
start contributing to the java projects. Also, I feel, we'll be able
to have a "full" package of service built on top of any subset of
implementations.

I guess this is sounding more like the open-SDI/Geo-Web application
framework, but I want to stress that the system I'm describing here is
supposed to be quite simple - adding new services or complex processes
would be better implemented in the 2.0 (easy-to-add-new-services)
geoserver.

I'm hoping to get a little feedback and see what people think about this
type of thing. Feel free to call me crazy, or better yet, reply with
an even crazier idea.

dave

-----------------------------------------------------

I've been thinking a little bit about versioning. There's 3 places you
can put it:

1. in a datastore ("underneath geoserver")
2. inside geoserver (like the way the current validation slots into the
Transaction java code)

3. on top of geoserver - this is a bit weird, but it makes good sense.
Basically, have a component that sits outside geoserver that takes in
WMS/WFS requests and reforms its (i.e. XSLT) and sends them back down
to the actual WFS/WMS server. You can implement a large number of
actual services using this method. The main advantage is you can make
it fairly simple, and it can be used to extend any WFS (not just
geoserver) into a value-added WFS.

Here's some thoughts I wrote a while ago on it (GeoCollaborator = this
"on top of geoserver" component):

There's a core GeoCollaborator service. This is where most of the
work will actually go. The idea behind the service is very simple -
it takes in an OGC request, modifies it, and then spits it back out to
other OGC servers that actually handle the request.

  That means that outside - either above it or below it - is only
ever exposed to OGC requests and responses. This is key because you
get to extend all the power of those servers and people are free to
pick and choose what product (ie. geoserver, mapserver, degree, or a
commercial product) that will actually handle the work. Plus, you can
use any of the client applications (like udig and mapbuilder) that
"speak" OGC. It provides lots of power without actually complicating
anything.

  Some of the services it could offer would be:

  "logging" (who's viewing what and where) - any incoming read
request get transformed into a WFS write (storing the layer, user, time
and location) AND the read request.

  "validation" - i.e. an insert request for a road is transformed into
an
"is this road in the ocean?" request before the actual insert is
performed.

  "fixing" - ie. for roads, if two roads cross and are not
topologically noded, then pre-node them before inserting them.

  "versioning" - for rollback, point-in-time, and the like.

  "extended functionality" -- for example, Gabriel wanted to add a
"indirect image response" to the Geoserver WMS to make it easier to use
SLD-POST in a browser. Basically, you send a SLD-POST request to the
WMS and it returns a little XML fragment that says "go HTTP-GET your
image here". This is MUCH easier to actually use in a browser with
<IMG> tags (which do not allow HTTP-POST).

You would send an SLD-POST to the Geocollaborator and it will echo the
request to the underlying WMS (or, if it doesn't support SLD-POST,
convert it to an SLD-GET request) and capture the image response (or,
for dealing with dynamic data, remember how to make the WMS request).
It then sends back the small XML response saying how to HTTP-GET the
image.

This could then be used to "wrap" any WMS instead of just adding the
functionality to Geoserver.

  "layer creation"

  "user creation & security"

  "virtual datastores" -- these can be implemented by transforming the
results of a WFS request.

  ...and so on...

  All of these would be fairly simple and the main core will just be
a simple XSLT transformer with a nice plugin architecture. Most people
will use it "as is", the more technically daring could mix-and-match
pre-made (from other projects) plugsin, hard core geo-hackers could
make their own plugins

  The core is focused to do what is does really well. For more
'complex' things, you'd have to go to something like geoserver to
actually make the service.

I haven't really put much thought into this; whats your first reaction?
Personally, I like the "under" or "on top of" position.

The "under" has the advantage of probably being easiest, and then all
Geotools-based programs could use it.

The "on top of" has the advantage of extending all OGC WFS/WMS services
instead of just Geoserver/Geotools. It also makes a great place to put
the "rollback" "look for changes" "get point-in-time view" UI. Its
also a nice place to allow people to add simple plugins to extend
functionality (although the "new" easy-to-plugin-to-geoserver might be
a better place for complex things).

dave

----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/

(attachments)

geowiki.gif

No question that this should be a middle where (you referred to it as on top), both for the positioning of the project but also as a practicle consequence of the problem.
You will need to add some low-level support to geotools (fixing the Feature.getVersion() for example)

I will note that ArcSDE serves as the best example of middlewear in this domain. They mostly use ArcSDE to smooth over the differences between databases, tracking additional information such as version/locks/constraints etc...

I like the idea and would ask you to focus on a small tractable problem such as FeatureVersioning for a proof of concept. Make sure you have the architecture correct so others can play. Then do a port of the validation service (specification of constraints) and keep on building.

Rather then define your additional information in one go, make sure that you have a flexable system and start small.

Jody

Ok, first off, 'right on' - I think this can be incredibly powerful, and
I like the thought of it hopefully getting more cross over with the C
projects. And I like that we're testing out pluggability, hopefully to
incorporate the lessons learned here into geoserver 2.0

For this response I think I'm going to go with 'reply with an even
crazier idea', and yes, you can call me crazy too, but just throwing it
out there to stir the juices - haven't thought it through all the way.

What about doing it under _and_ over, implement at the datastore level,
but have geocollaborator expose it all using WMS and WFS datastores,
completely sitting in its own external project. But since it's all
done with datastores you could for example just plug-in a postgis
datastore instead of a WFS datastore, for example. It would plug in to
more than just any OGC compliant service, it would also plug on top of
any (transactional) datastore supported by GeoTools.

The main place this is coming from is my fear of XSLT. This may be an
irrational fear, but I guess I worry that XSLT is in many ways another
whole programming language, and a very verbose and not easy to
understand one at that. Or perhaps I'm not clear exactly what will be
done in geotools and what will be done in xslt, where that line will be
drawn.

As an alternative couldn't you just use the GeoServer code to read WFS
requests, put them into a Query object, and then write java code to do
the appropriate actions with that Query? Reformulate them into new
Queries to be sent to the base WFSes. The reformulation is done in
Java, which is more widely known than XSLT.

The upside to this is that if we're just making Queries, then they can
be directed at Datastores - GeoCollaborator code would just always use
a WFS datastore. The downside is that WFS datastore seems to be in a
state of transition with the new xdo code, and we're still not sure how
maintainable it is. Of course the upside of that is it will force us
to do a lot more bug fixing on it, and end up with a really high
quality WFS datastore/parser.

We also could consider doing both - if we're going to encourage more
value added plug-ins it could actually be nice to give people the
option to do xslt or Java - some of the C types may be more comfortable
with xslt (though some of the mapserver crowd seem to really not like
xml). Make like the logging plug-in available as both, for
contributors to use as an example of either xslt or pure java style.

I don't think slotting into GeoServer is a good idea, the validation
code should either move up or down the stack.

The over/under approach also has the advantage of being able to have
GeoCollaborator functionality stand on its own, but also to be easier
integrated as a GeoServer 2.0 plugin, so that admins could just do one
install.

Chris

Quoting dblasby@anonymised.com:

I've quite interested in getting Geoserver doing FeatureVersioning so
that when we (TOPP) start doing
Geo-Wiki/GeoCollaborator/Public-Participation GIS stuff that can
track
and rollback changes, and "view" the dataset at a particular
point-in-time.

Personally, I think this (in itself) will be extremely powerful.

The design is a bit tricky - but I think the actual versioning code
will
not be too difficult.

There are a few meta-design issues that need to be resolved first.
I've
indirectly talked to a few people about this, but I, unfortunately,
haven't had time to put more than cursory thought into it.

In the email, below, I argue for a system (probably built using
geotools
& XSLT) that can "extend" the functionality of already existing
services. All these services could be directly added to Geoserver or
Geotools, but by building it in this system I feel its (1) easier and
(2) allows more cross-project collaborations.

I feel the Java GIS stuff is totally separated from the non-Java GIS
stuff, and this makes us "marginalized". Tyler recently summarized
the
major open-source mapping projects and didn't mention a single Java
based one. Since the system I describe below can be used to give
extra functionality to ALL systems that implements the OGC services
(ie. mapserver and - shock - ARCIMS) I believe we'll have some common
ground to share.

I'm hoping that we'll be able to entice people who want to continue
using Mapserver (or whatever they're currently using) to know & use
the
java project, and also so that people using the non-java projects
could
start contributing to the java projects. Also, I feel, we'll be able
to have a "full" package of service built on top of any subset of
implementations.

I guess this is sounding more like the open-SDI/Geo-Web application
framework, but I want to stress that the system I'm describing here
is
supposed to be quite simple - adding new services or complex
processes
would be better implemented in the 2.0 (easy-to-add-new-services)
geoserver.

I'm hoping to get a little feedback and see what people think about
this
type of thing. Feel free to call me crazy, or better yet, reply with
an even crazier idea.

dave

-----------------------------------------------------

I've been thinking a little bit about versioning. There's 3 places
you
can put it:

1. in a datastore ("underneath geoserver")
2. inside geoserver (like the way the current validation slots into
the
Transaction java code)

3. on top of geoserver - this is a bit weird, but it makes good
sense.
Basically, have a component that sits outside geoserver that takes in
WMS/WFS requests and reforms its (i.e. XSLT) and sends them back down
to the actual WFS/WMS server. You can implement a large number of
actual services using this method. The main advantage is you can
make
it fairly simple, and it can be used to extend any WFS (not just
geoserver) into a value-added WFS.

Here's some thoughts I wrote a while ago on it (GeoCollaborator =
this
"on top of geoserver" component):

There's a core GeoCollaborator service. This is where most of the
work will actually go. The idea behind the service is very simple -
it takes in an OGC request, modifies it, and then spits it back out
to
other OGC servers that actually handle the request.

  That means that outside - either above it or below it - is only
ever exposed to OGC requests and responses. This is key because you
get to extend all the power of those servers and people are free to
pick and choose what product (ie. geoserver, mapserver, degree, or a
commercial product) that will actually handle the work. Plus, you
can
use any of the client applications (like udig and mapbuilder) that
"speak" OGC. It provides lots of power without actually complicating
anything.

  Some of the services it could offer would be:

  "logging" (who's viewing what and where) - any incoming read
request get transformed into a WFS write (storing the layer, user,
time
and location) AND the read request.

  "validation" - i.e. an insert request for a road is transformed
into
an
"is this road in the ocean?" request before the actual insert is
performed.

  "fixing" - ie. for roads, if two roads cross and are not
topologically noded, then pre-node them before inserting them.

  "versioning" - for rollback, point-in-time, and the like.

  "extended functionality" -- for example, Gabriel wanted to add a
"indirect image response" to the Geoserver WMS to make it easier to
use
SLD-POST in a browser. Basically, you send a SLD-POST request to the
WMS and it returns a little XML fragment that says "go HTTP-GET your
image here". This is MUCH easier to actually use in a browser with
<IMG> tags (which do not allow HTTP-POST).

You would send an SLD-POST to the Geocollaborator and it will echo
the
request to the underlying WMS (or, if it doesn't support SLD-POST,
convert it to an SLD-GET request) and capture the image response (or,
for dealing with dynamic data, remember how to make the WMS request).
It then sends back the small XML response saying how to HTTP-GET the
image.

This could then be used to "wrap" any WMS instead of just adding the
functionality to Geoserver.

  "layer creation"

  "user creation & security"

  "virtual datastores" -- these can be implemented by transforming
the
results of a WFS request.

  ...and so on...

  All of these would be fairly simple and the main core will just be
a simple XSLT transformer with a nice plugin architecture. Most
people
will use it "as is", the more technically daring could mix-and-match
pre-made (from other projects) plugsin, hard core geo-hackers could
make their own plugins

  The core is focused to do what is does really well. For more
'complex' things, you'd have to go to something like geoserver to
actually make the service.

I haven't really put much thought into this; whats your first
reaction?
Personally, I like the "under" or "on top of" position.

The "under" has the advantage of probably being easiest, and then all
Geotools-based programs could use it.

The "on top of" has the advantage of extending all OGC WFS/WMS
services
instead of just Geoserver/Geotools. It also makes a great place to
put
the "rollback" "look for changes" "get point-in-time view" UI. Its
also a nice place to allow people to add simple plugins to extend
functionality (although the "new" easy-to-plugin-to-geoserver might
be
a better place for complex things).

dave

----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/

----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/

Hi Chris, a quick few points:

What about doing it under _and_ over, implement at the datastore level,
but have geocollaborator expose it all using WMS and WFS datastores,

A: Under and Over is a good idea
A: DataStore is never going to be as general purpose as XSLT, we need to use the tools given to us. We can use XDO to go from XML to normal Objects and that is great when you are playing to the strength of objects. Network builders, graph systems, spatial reasoning. It is bad when you are trying to morph between schemas. It is just not one of the strengths of object-oriented programing ...

The main place this is coming from is my fear of XSLT. This may be an
irrational fear, but I guess I worry that XSLT is in many ways another
whole programming language, and a very verbose and not easy to
understand one at that. Or perhaps I'm not clear exactly what will be
done in geotools and what will be done in xslt, where that line will be
drawn.

A good sensible fear. XSLT is another programming language, it is really hard to debug. It is however a functional language and as such a good fit for morphing between scheams.

As an alternative couldn't you just use the GeoServer code to read WFS
requests, put them into a Query object, and then write java code to do
the appropriate actions with that Query? Reformulate them into new
Queries to be sent to the base WFSes. The reformulation is done in
Java, which is more widely known than XSLT.

Most J2EE apps use the right tool for the job, making use of XSLT in the middle of their Java program. Basically we could do it, and we could do it for WFS Query.

But what about WMS Query? And Catalog Query? We already get into the stage where we have visitors that need to traverse Filters and produce new ones. People may know Java, but once we start breaking out visitor patterns and asking people to think about nodes in a Filter tree and if they are mutable or not understanding quickly breaks down.

A more scalable solution would be to attack the XML to manipulate the filter elements and then run the resulting XML into the parser. Now that can be hidden behind an API, but XSLT is probably the right tool for the job.

I also get the sense we are swiming up stream here. Look at the fun we have with XPath and our object Systems. It has been a couple years and we are only just now looking into using JXPath to bridge the gap between XML specifications and our object oriented implementations of them.

state of transition with the new xdo code, and we're still not sure how
maintainable it is. Of course the upside of that is it will force us
to do a lot more bug fixing on it, and end up with a really high
quality WFS datastore/parser.

Yes, the parser design rocks out :slight_smile:

We also could consider doing both - if we're going to encourage more
value added plug-ins it could actually be nice to give people the
option to do xslt or Java - some of the C types may be more comfortable
with xslt (though some of the mapserver crowd seem to really not like
xml).

I am not sure if we should worry that much, at the where conference there was a deep understanding of XML. My understanding is that AJAX forces one to play with these issues, so I expect the knowledge base is only going to improve.

I don't think slotting into GeoServer is a good idea, the validation
code should either move up or down the stack.

Validation has to move down, it is needed between datastores.

The over/under approach also has the advantage of being able to have
GeoCollaborator functionality stand on its own, but also to be easier
integrated as a GeoServer 2.0 plugin, so that admins could just do one
install.

I will point out that the deegree architecutre is really "beside", they chain their different modules together. At a code level they are all on equal footing as web services. The chain of Catalog to WFS to WMS is similar to the "over" idea.

I am not sure if you will notice a difference between over and beside if your components are distinct web services? My understanding is the commerical implementations are set up in this manner as well, they just recognize when they are talking to one of their "peer" services and are able to optimize...

Jody

I sent this mail yesterday to Dave but forgot to CC to the list...
sorry! I'd like also to discuss Chris reply, it's full of interesting
things... later on! :o)

Hi Dave & all,

On 15 Jul 2005 at 12:18, dblasby@anonymised.com wrote:

I've quite interested in getting Geoserver doing FeatureVersioning
so that when we (TOPP) start doing Geo-Wiki/GeoCollaborator/Public-
Participation GIS stuff that can track and rollback changes, and >

"view" the dataset at a particular point-in-time.

Paolo Rizzi and me have the same interest in versioning, we have been
thinking about this and we'll go on thinking, but soon, very soon,
we'll have to start implementing, I think by september. It would be
great working together and sharing our efforts and experiences, I
think.

Our ideas about versioning:

- make it transparent to the user (hide the versioning metadata &
processes)
- be independent (at the interface level) from the actual storage
system (postgres, oracle and so on)
- be capable of exploiting a built-in versioning system from the
storage system, if present (I think that Oracle has got one)
- be compatible with GML3 specs.
- ...?

We envisaged a possible difference between versioning at feature
level or at feature set level. - Still a bit of confusion about this
issue
;o)

Feature level versioning is suitable with "events", i.e. features in
which the lifespan is explicitly defined in the data model, so it's
"natural" for the user to always ask for features within a particular
time span. Each feature is somehow independent from the other. The
release process of such data is continuous, features are added as
soon as they are available to the GIS operators.

Featureset level versioning deals with interrelated features, like
road networks or buildings polygon coverages, were it's quite common
that editing a feature will result in modifying the neighbouring
features. Usually we're interested in the whole road network version,
rather than in the version of the single arc, also because the road
network may be used as the foundation for route systems, car
circulations schemes, pollution modeling graphs and so on... The
release process of such information is discontinuous (maybe one
release per month).

I think that we could use nearly the same code for getting both
working. Feature set versioning should work like CVS, I've already
figured out how, will post this in a wiki page soon.

Anyway, making versioning work means:

- when inserting/updating/deleting features, (possibly hidden)
metadata fields in the feature should be automatically filled in by
the versioning system;
- features are never deleted;
- when querying data, filters about version should be always
implicitly added;

We were thinking about implementing these features at the DataStore
level, so using a VersioningDataStore which delegates data storing to
an underlying DataStore (postgis, oracle), and handles queries and
updates filling in versioning metadata and transforming FeatureTypes
and Features from the underlying DS removing the hidden fields.

This would also fit fine in our calculation framework which lives in
the transaction handling code in geoserver (calculation objects are
basically Validation objects capable of editing features) and which
we are planning to move outside GeoServer so it can be used with a
GeoTools enabled application. Paolo Rizzi, the mad inventor, can say
more about this. ;o)

Your idea of using a "proxy" WFS service which handles versioning on
the top of another service would be implemented in our case using a
WFS datastore as storage for the VersioningDataStore.

I like the idea of XSLT transformation because seems to be quite
straightforward, and also it's portable to non-geotools systems. The
problem is that (I presume) it won't allow for exploiting underlying
versioning models which might be available in the RDBMSs. (Maybe this
issue is not so important, but I'm quite interested in trying to
exploit high level functions, like versioning and topology
maintenance, when available in the underlying RDBMS)

I think we should discuss this more, of course I couldn't explain
everything. We'll start writing a Wiki page about our versioning
model, you could do the same, maybe on the same page(s)? We could
structure the wiki page with different child pages, like:
requirements, existing standards, existing low-level mechanisms (ie
oracle), proposed models, case studies and examples... what do you
think?

We can start working on this maybe not immediately, but during next
week I can put some ideas on Wiki pages.

Cheers

Sig