[Geoserver-devel] Versioning feedback

The versioning stuff looks pretty good, definitely in line with what I've been thinking. A couple questions and thoughts...

Would there be a way (perhaps an alternate implementation) that would be able to operate without modifying the Data table? Like if someone wanted to use versioning but didn't want their main table modified? (just curious, it's ok if the answer is 'no' or 'really hard'.

For the revision attribute on GetFeature, could we just use the WFS 'featureVersion' construct? That would entail us not putting in an extra read-only attribute.

From the spec:

'The optional featureVersion attribute on the <Query> element is included in order to accommodate systems that support feature versioning. A value of ALL indicates that all versions of a feature should be fetched. Otherwise, an integer, n, can be specified to return the nth version of a feature. The version numbers start at 1, which is the oldest version. If a version value larger than the largest version number is specified, then the latest version is returned. The default action shall be for the query to return the latest version. Systems that do not support versioning can ignore the parameter and return the only version that they have. It should be noted, that if the value of the featureVersion parameter is set to ALL, the resulting XML document will contain duplicate feature identifiers and thus cannot be validated.'

I believe that our svn style is compatible with this. So if we did this you don't need a filter to specify the version number, you just do it as an attribute in your Query. But filters are still compatible with this. And if people request 'all', then we return all versions for that GetFeature request.

GetLog looks good. It should just be a GetFeature call. In the future we could expose a new 'GetLog' operation, that turns common filter operations (filters against dates to/from, specific users, ect.) in to more top level parameters. We can easily create a new output format for the WFS, indeed probably can just re-use some of the code from the WMS (would be cool if we could get those two outputs as the same, so our GML WFS output format could be used directly by the WMS).

The revision stuff additions in the Transaction should probably be change requests to the WFS spec eventually. One thing to note is that the wiki way is more to just allow the overwrite and to store both in the history. So it's ok if we do that. But I do agree we should encourage revision checks, making sure clients are up to date.
But all your additions to transaction are sensible.

GetDiff - might be nice if one could specify dates in addition to revision numbers? Something more for the future...
Could the results of a GetDiff possibly be a WFS-Transaction? A transaction is any number of insert, updates, and deletes. Isn't that more or less what we'd be reporting? The updates that have taken place? Not sure if this makes sense, just a thought, and it'd make Rollbacks as GetDiff + Transaction be quite easy. Though I do think a rollback operation with a RollbackTypeElement could still make sense, as a shortcut to doing the whole thing.

The rollback looks good, I like the idea of implementing it as a new element type.

best regards,

Chris

--
Chris Holmes
The Open Planning Project
http://topp.openplans.org

Chris Holmes ha scritto:

The versioning stuff looks pretty good, definitely in line with what I've been thinking. A couple questions and thoughts...

Would there be a way (perhaps an alternate implementation) that would be able to operate without modifying the Data table? Like if someone wanted to use versioning but didn't want their main table modified? (just curious, it's ok if the answer is 'no' or 'really hard'.

It's not impossible, it would just require more complex implementation,
because you would need to use UNION and some joins in order to gather
from the main table only the records that haven't been changed/deleted during versioned edits. So it would be at the same time more complex
and slower.

Why would one keep the base table unmodified? If it's big, the versioned
datastore could "import" it using a set of alter tables, and setup
a view for the original and last revision for other clients to hit
in a read-only way.

For the revision attribute on GetFeature, could we just use the WFS 'featureVersion' construct? That would entail us not putting in an extra read-only attribute.

From the spec:

'The optional featureVersion attribute on the <Query> element is included in order to accommodate systems that support feature versioning. A value of ALL indicates that all versions of a feature should be fetched.

This is simply nonsense to me... why would I want to get all versions
of one feature?

> Otherwise, an integer, n, can be specified to return

the nth version of a feature. The version numbers start at 1, which is the oldest version. If a version value larger than the largest version number is specified, then the latest version is returned. The default action shall be for the query to return the latest version. Systems that do not support versioning can ignore the parameter and return the only version that they have. It should be noted, that if the value of the featureVersion parameter is set to ALL, the resulting XML document will contain duplicate feature identifiers and thus cannot be validated.'

I believe that our svn style is compatible with this. So if we did this you don't need a filter to specify the version number, you just do it as an attribute in your Query. But filters are still compatible with this. And if people request 'all', then we return all versions for that GetFeature request.

I had two problems with this:
* it seemed to be version number should change independently for each feature and be progressive (as in cvs)... but in fact, a global revision
number could be used like this anyways, we can always say we have version 1080 of a feature even if it's still equal to the original.
Yet, returning 1080 copies of the same feature would be... hum... crazy... what would we do in this case, return the first and last (even if equal), and every intermediate that happens to be different?
* having revision as an attribute is needed to build a checkout no?
   Well, we have to provide that information, so we would need to extend
   the GML response to include it somewhere anyways...

GetLog looks good. It should just be a GetFeature call. In the future we could expose a new 'GetLog' operation, that turns common filter operations (filters against dates to/from, specific users, ect.) in to more top level parameters. We can easily create a new output format for the WFS, indeed probably can just re-use some of the code from the WMS (would be cool if we could get those two outputs as the same, so our GML WFS output format could be used directly by the WMS).

Ok

The revision stuff additions in the Transaction should probably be change requests to the WFS spec eventually. One thing to note is that the wiki way is more to just allow the overwrite and to store both in the history. So it's ok if we do that. But I do agree we should encourage revision checks, making sure clients are up to date.
But all your additions to transaction are sensible.

GetDiff - might be nice if one could specify dates in addition to revision numbers? Something more for the future...

We could in fact... as an extension, since we're already playing with
our own new call.

Could the results of a GetDiff possibly be a WFS-Transaction? A transaction is any number of insert, updates, and deletes. Isn't that more or less what we'd be reporting? The updates that have taken place?

Hum, it could be, but clients usually generate it, they do not parse
it. I guess a client would want a easy way to show what changed on a map, so probably having the original and the changed feature would be
nicer (thoguth not very compact, that why I was thinking about returning just the attributes that did change).

Cheers
Andrea

Andrea Aime wrote:

Chris Holmes ha scritto:

The versioning stuff looks pretty good, definitely in line with what I've been thinking. A couple questions and thoughts...

Would there be a way (perhaps an alternate implementation) that would be able to operate without modifying the Data table? Like if someone wanted to use versioning but didn't want their main table modified? (just curious, it's ok if the answer is 'no' or 'really hard'.

It's not impossible, it would just require more complex implementation,
because you would need to use UNION and some joins in order to gather
from the main table only the records that haven't been changed/deleted during versioned edits. So it would be at the same time more complex
and slower.

Why would one keep the base table unmodified? If it's big, the versioned
datastore could "import" it using a set of alter tables, and setup
a view for the original and last revision for other clients to hit
in a read-only way.

Ok, sounds good. I just ask since I figure some people in the future will ask.

For the revision attribute on GetFeature, could we just use the WFS 'featureVersion' construct? That would entail us not putting in an extra read-only attribute.

From the spec:

'The optional featureVersion attribute on the <Query> element is included in order to accommodate systems that support feature versioning. A value of ALL indicates that all versions of a feature should be fetched.

This is simply nonsense to me... why would I want to get all versions
of one feature?

I think it's basically a crude 'diff'?

> Otherwise, an integer, n, can be specified to return

the nth version of a feature. The version numbers start at 1, which is the oldest version. If a version value larger than the largest version number is specified, then the latest version is returned. The default action shall be for the query to return the latest version. Systems that do not support versioning can ignore the parameter and return the only version that they have. It should be noted, that if the value of the featureVersion parameter is set to ALL, the resulting XML document will contain duplicate feature identifiers and thus cannot be validated.'

I believe that our svn style is compatible with this. So if we did this you don't need a filter to specify the version number, you just do it as an attribute in your Query. But filters are still compatible with this. And if people request 'all', then we return all versions for that GetFeature request.

I had two problems with this:
* it seemed to be version number should change independently for each feature and be progressive (as in cvs)... but in fact, a global revision
number could be used like this anyways, we can always say we have version 1080 of a feature even if it's still equal to the original.
Yet, returning 1080 copies of the same feature would be... hum... crazy... what would we do in this case, return the first and last (even if equal), and every intermediate that happens to be different?

Yeah, that was my thought, just return the ones that had actually changed. It's definitely bending the spec a bit, but the specs not very thought out...

* having revision as an attribute is needed to build a checkout no?
  Well, we have to provide that information, so we would need to extend
  the GML response to include it somewhere anyways...

Hmmm... That's a good point. But extending the GML response would allow us to return it more efficiently than a version number on every feature... which I think would have to be returned if it's an attribute?

But yeah, changing the GML response is kind of nasty, since it could mess up clients - maybe only change response if they make a call to a GetFeatureAtRevision operation? Or if they include a revision in the normal GetFeature...

GetLog looks good. It should just be a GetFeature call. In the future we could expose a new 'GetLog' operation, that turns common filter operations (filters against dates to/from, specific users, ect.) in to more top level parameters. We can easily create a new output format for the WFS, indeed probably can just re-use some of the code from the WMS (would be cool if we could get those two outputs as the same, so our GML WFS output format could be used directly by the WMS).

Ok

The revision stuff additions in the Transaction should probably be change requests to the WFS spec eventually. One thing to note is that the wiki way is more to just allow the overwrite and to store both in the history. So it's ok if we do that. But I do agree we should encourage revision checks, making sure clients are up to date.
But all your additions to transaction are sensible.

GetDiff - might be nice if one could specify dates in addition to revision numbers? Something more for the future...

We could in fact... as an extension, since we're already playing with
our own new call.

Ok.

Could the results of a GetDiff possibly be a WFS-Transaction? A transaction is any number of insert, updates, and deletes. Isn't that more or less what we'd be reporting? The updates that have taken place?

Hum, it could be, but clients usually generate it, they do not parse
it. I guess a client would want a easy way to show what changed on a map, so probably having the original and the changed feature would be
nicer (thoguth not very compact, that why I was thinking about returning just the attributes that did change).

Hrm, that's a good point, I do like the side effect of showing a diff on a map. I do also like the idea of a 'diff file', that someone could email an admin and they could apply it themselves. And that a client could suck down a whole WFS and be able to just get the diff and use that to update itself... So it's a interesting set of concerns - needs to be human readable, needs to be machine readable to display on a map, should be compact...

Chris

Cheers
Andrea

!DSPAM:1003,456b66ce286861460912952!

--
Chris Holmes
The Open Planning Project
http://topp.openplans.org

Chris Holmes ha scritto:

Andrea Aime wrote:

Chris Holmes ha scritto:

...

Why would one keep the base table unmodified? If it's big, the versioned
datastore could "import" it using a set of alter tables, and setup
a view for the original and last revision for other clients to hit
in a read-only way.

Ok, sounds good. I just ask since I figure some people in the future will ask.

Another approach could be the OpenStreetMap one... have the last revision stored in one table, and the full history of changes in another.
That would make last revision extraction real fast, but has a couple of downsides:
* people may think the can update the main table directly, whilst this
   would break versioning bit time if the history table is not updadted
   as well;
* adding branches would break this design, having a "last" table for
   each branch would be very costly space wise. An alternative would be
   to query the last trunk, go back in time to the moment the branch
   was created, and then forward using only changes performed on the
   branch (so branch information could be stored only in the history
   table)

Hum, I'm going to check how far we can go with this design, it has
the merit of having last data accessible in an "unversioned" manner.

* having revision as an attribute is needed to build a checkout no?
  Well, we have to provide that information, so we would need to extend
  the GML response to include it somewhere anyways...

Hmmm... That's a good point. But extending the GML response would allow us to return it more efficiently than a version number on every feature... which I think would have to be returned if it's an attribute?

But yeah, changing the GML response is kind of nasty, since it could mess up clients - maybe only change response if they make a call to a GetFeatureAtRevision operation? Or if they include a revision in the normal GetFeature...

Having revision in the feature type has a nice side effect on WMS too,
in that the user can query a specific revision with standard filters
(the same goes for branch id once we add it). I guess these out weight
the need for extra information.

Oh, it occurss to me that WFS clients can simply ask a list of properties to be retrieved and omit revision if they do not want to pay the extra transport cost,no?

Could the results of a GetDiff possibly be a WFS-Transaction? A transaction is any number of insert, updates, and deletes. Isn't that more or less what we'd be reporting? The updates that have taken place?

Hum, it could be, but clients usually generate it, they do not parse
it. I guess a client would want a easy way to show what changed on a map, so probably having the original and the changed feature would be
nicer (thoguth not very compact, that why I was thinking about returning just the attributes that did change).

Hrm, that's a good point, I do like the side effect of showing a diff on a map. I do also like the idea of a 'diff file', that someone could email an admin and they could apply it themselves. And that a client could suck down a whole WFS and be able to just get the diff and use that to update itself... So it's a interesting set of concerns - needs to be human readable, needs to be machine readable to display on a map, should be compact...

These can be addressed using different output formats:
* display: return two sets of full features for the two revisions in the
   diff (but only those that have changed, are new or deleted). This
   could be even done with GetFeature, if we come up with something that
   looks like svn cat -r m:n -d m where m is the revision I want to
   retrive, but limiting myself to features changed between m and n.
   Two calls, two layers to be displayed in different color with some
   transparency and you're done.
* diff: do as I said in the document, GML like but return only the
   attributes that did change, and have some marker for deleted or
   newly inserted features.

I hope GML is still human readable enough... (well, you just have to
severely limit the kind of human that will read that stuff by
filtering on a degree on computer science...)

Cheers
Andrea

On 11/28/06, Andrea Aime <aaime@anonymised.com> wrote:

Chris Holmes ha scritto:
>
> Andrea Aime wrote:
>> Chris Holmes ha scritto:
...

> Hrm, that's a good point, I do like the side effect of showing a diff on
> a map. I do also like the idea of a 'diff file', that someone could
> email an admin and they could apply it themselves. And that a client
> could suck down a whole WFS and be able to just get the diff and use
> that to update itself... So it's a interesting set of concerns - needs
> to be human readable, needs to be machine readable to display on a map,
> should be compact...

These can be addressed using different output formats:
* display: return two sets of full features for the two revisions in the
   diff (but only those that have changed, are new or deleted). This
   could be even done with GetFeature, if we come up with something that
   looks like svn cat -r m:n -d m where m is the revision I want to
   retrive, but limiting myself to features changed between m and n.
   Two calls, two layers to be displayed in different color with some
   transparency and you're done.

You can do something like
http://vision.edina.ac.uk/cgi-bin/wms-vision?version=1.1.0&request=getMap&layers=newpop%2Csmall_1920%2Cmed_1904%2Cgft2o&styles=&SRS=EPSG:27700&Format=image/png&width=450&height=450&bgcolor=cfd6e5&bbox=420475.2,424914.7,441400.8,445840.3&exception=application/vnd.ogc.se_inimage&gunit=10108809
which is how I solved a similar problem for Vision of Britain
(http://www.visionofbritain.org.uk) when we had to show boundary
changes over time.

Ian
--

Ian Turton
http://www.geotools.org
http://pennspace.blogspot.com/