[Geoserver-devel] Some updates from the versioning front

Hi all,
some not so quick updates on the verioning front, as well as
a request for feedback on the following.

So far the early prototype has getfeature, getlog, getdiff
and rollback working, so it's ready to be tested by a wider
audience. If you're interested, follow instructions here:
http://docs.codehaus.org/display/GEOS/Trying+out+the+early+WFS-V+prototype

Yet, quite a bit or work still needs to be done on the response
side of the equations, as well as some thinking.

GetFeature is able to properly answer queries that do use featureVersion, allowing revision specific extraction (point in
time extraction is still not there, but easy to add).
The issue here is that we can have multiple queries, all hitting
the same features, but at different revisions, resulting in duplicated
features in the output. Well, not really duplicated, but same feature
at different revision without the ability to tell which version is which.
Here my suggestion is to create a new VersionedFeature element, same substition group as gml:Feature, but sporting a featureVersion attribute . We can activate generation of those features only if the version is different than the latest one. Code wise, it means we need a Feature subclass that has this attribute (may be a straight subclass, or a wrapper). There is also the question on where we do use that: shall the versioned data store always answer with these special features, or do we handle this inside the wfs Transaction code? The answer is not trivial,
since when you make queries on the datastore you specify the revision you're asking for, so you know what you get.
On the other side, knowing the verison you get is a by product of the
svn-like behaviour, that is, all features you get are at the revision
you asked for. If you do the same with a cvs like approach, with a point in time query, you would get features at different versions.

Rollback is just a new element in the transaction call, and it's ok response wise, that is, provides a correct record of features
that have been updated, deleted and inserted as a result of the rollback. The fids are preserved, that is, if a deletion is rolled back, the re-created feature will have the same fid as before deletion.
The issue here is that at the current stage rollback does not support user filtering, and works only if the fromVersion is the current one.
Having a fromVersion different than the latest one effectively
turns rollback into a merge operation, and raises the question or
what to do with deleted features: what if the deleted feature has already been restored? The diff tells us to create it, but it's already there. Moreover, having the ability to specify from and to version
means that people could ask to merge a forward diff.
So, in the end, it seems that if we go that way, rollback can be
thrown away, and replaced by a merge instead. In fact, that's what
we're using with subversion, that is, we only do merges, some or which
can be thought as rollbacks. Merge could in fact be asked to be fid
preserving with a new attributes, if the fid is already there, no big issues, we just have to fail the transaction.

GetDiff and GetLog are good, in that the response is correct and complete afaik, thought it's not really human readable.
GetDiff provides a proper wfs:Transaction as a response, it's the transaction you should run to go from version x to version y (and works both forwards and backwards, that is, both if x < y and if x > y).
GetLog provides a GML2 response.
Both could use a more human oriented answer, that I plan to add
when all calls have at least one fully working response.
If you have suggestions on this, I'm all ears (and eyes :-))

Finally, versioning would not be complete if the WMS side of the equation is not filled as well. I plan to add a VERSION attribute to the GetMap request, that would be a list of featureVersion specs.
This would allow for visual diffs and the like: just request two
times the same layer, at different revision, and eventually use filter
to gather exactly the features you want to look at.
This also raises the question on how to do the same with GetMap post requests, we'll have to extend the GetMap syntax there to allow for
revision specification... at the moment, each layer can have a filter,
but that won't include the revision. This stinks a little, because we would have to extend the SLD portion of the GetMap request :frowning:

Once all of this is fixed, we'll have to think about authentication and the client side of the equation.

Well, suggestions, comments, feedback in general would be very welcomed.
Cheers
Andrea

Andrea Aime wrote:

Hi all,
some not so quick updates on the verioning front, as well as
a request for feedback on the following.

So far the early prototype has getfeature, getlog, getdiff
and rollback working, so it's ready to be tested by a wider
audience. If you're interested, follow instructions here:
http://docs.codehaus.org/display/GEOS/Trying+out+the+early+WFS-V+prototype

Excellent.

Yet, quite a bit or work still needs to be done on the response
side of the equations, as well as some thinking.

GetFeature is able to properly answer queries that do use featureVersion, allowing revision specific extraction (point in
time extraction is still not there, but easy to add).
The issue here is that we can have multiple queries, all hitting
the same features, but at different revisions, resulting in duplicated
features in the output. Well, not really duplicated, but same feature
at different revision without the ability to tell which version is which.
Here my suggestion is to create a new VersionedFeature element, same substition group as gml:Feature, but sporting a featureVersion attribute . We can activate generation of those features only if the version is different than the latest one. Code wise, it means we need a Feature subclass that has this attribute (may be a straight subclass, or a wrapper). There is also the question on where we do use that: shall the versioned data store always answer with these special features, or do we handle this inside the wfs Transaction code? The answer is not trivial,
since when you make queries on the datastore you specify the revision you're asking for, so you know what you get.
On the other side, knowing the verison you get is a by product of the
svn-like behaviour, that is, all features you get are at the revision
you asked for. If you do the same with a cvs like approach, with a point in time query, you would get features at different versions.

In my view if what we're going for eventually is a version control system for geospatial data then we should probably always return a featureVersion. I would have a versioned datastore always answer with these features.

But I think we should punt on this one, until we get closer to a spec stage. The current spec says 'It should be noted, that if the value of the featureVersion parameter is set to ALL, the resulting XML document will contain duplicate feature identifiers and thus cannot be validated.' So basically if people put in lots of queries on the same data set at different revisions they're going to get duplicates, and we can live with that for now.

Rollback is just a new element in the transaction call, and it's ok response wise, that is, provides a correct record of features
that have been updated, deleted and inserted as a result of the rollback. The fids are preserved, that is, if a deletion is rolled back, the re-created feature will have the same fid as before deletion.
The issue here is that at the current stage rollback does not support user filtering, and works only if the fromVersion is the current one.
Having a fromVersion different than the latest one effectively
turns rollback into a merge operation, and raises the question or
what to do with deleted features: what if the deleted feature has already been restored? The diff tells us to create it, but it's already there. Moreover, having the ability to specify from and to version
means that people could ask to merge a forward diff.
So, in the end, it seems that if we go that way, rollback can be
thrown away, and replaced by a merge instead. In fact, that's what
we're using with subversion, that is, we only do merges, some or which
can be thought as rollbacks. Merge could in fact be asked to be fid
preserving with a new attributes, if the fid is already there, no big issues, we just have to fail the transaction.

I think doing it as merge could make a lot of sense in the future. For now let's just say that one can only rollback from the current version. For now we're trying to replicate the wiki paradigm, which afaik does not have an ability to do a rollback at a specific point in time, only at the latest.

As for no user filtering, is there none at all? I'd like to see one be able to specify at least an FID, and rollback that. I don't want to have to rollback all features if someone messed one up 30 commits ago. The other big use case I think is to be able to rollback based on a user, to take back everything that a vandal may have done.

Past that more complex user filtering would be nice, to be able rollback a user in a certain bbox, or all roads that got wrongly marked as 'freeway' or some such. If doing that isn't a ton more than FID + user rollbacks then I'd say go for it, but those two should be sufficient for now.

GetDiff and GetLog are good, in that the response is correct and complete afaik, thought it's not really human readable.
GetDiff provides a proper wfs:Transaction as a response, it's the transaction you should run to go from version x to version y (and works both forwards and backwards, that is, both if x < y and if x > y).
GetLog provides a GML2 response.

Great.

Both could use a more human oriented answer, that I plan to add
when all calls have at least one fully working response.
If you have suggestions on this, I'm all ears (and eyes :-))

These may be a good place to use some templating. I see this as similar to a GetFeatureInfo response, in that it's often just displayed right in the client, in however the server puts it out. Therefor we should allow control on the server side on how to format. So maybe make a basic text one and a basic html one, and then see what others come up with?

Finally, versioning would not be complete if the WMS side of the equation is not filled as well. I plan to add a VERSION attribute to the GetMap request, that would be a list of featureVersion specs.
This would allow for visual diffs and the like: just request two
times the same layer, at different revision, and eventually use filter
to gather exactly the features you want to look at.

This should probably hook in to the WMS 'Time' parameter as well, on versioned feature sets it could be a short cut in to that?

This also raises the question on how to do the same with GetMap post requests, we'll have to extend the GetMap syntax there to allow for
revision specification... at the moment, each layer can have a filter,
but that won't include the revision. This stinks a little, because we would have to extend the SLD portion of the GetMap request :frowning:

I think we should just leave GetMap post alone until someone specifically requests it. GetMap post isn't a real OGC specification, I think it's just a discussion paper in version 0.3.

Once all of this is fixed, we'll have to think about authentication and the client side of the equation.

Sounds great.

Chris

Well, suggestions, comments, feedback in general would be very welcomed.
Cheers
Andrea

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Chris Holmes
The Open Planning Project
http://topp.openplans.org

Chris Holmes ha scritto:

Andrea Aime wrote:

But I think we should punt on this one, until we get closer to a spec stage. The current spec says 'It should be noted, that if the value of the featureVersion parameter is set to ALL, the resulting XML document will contain duplicate feature identifiers and thus cannot be validated.' So basically if people put in lots of queries on the same data set at different revisions they're going to get duplicates, and we can live with that for now.

Ok.

The issue here is that at the current stage rollback does not support user filtering, and works only if the fromVersion is the current one.
Having a fromVersion different than the latest one effectively
turns rollback into a merge operation, and raises the question or
what to do with deleted features: what if the deleted feature has already been restored? The diff tells us to create it, but it's already there. Moreover, having the ability to specify from and to version
means that people could ask to merge a forward diff.
So, in the end, it seems that if we go that way, rollback can be
thrown away, and replaced by a merge instead. In fact, that's what
we're using with subversion, that is, we only do merges, some or which
can be thought as rollbacks. Merge could in fact be asked to be fid
preserving with a new attributes, if the fid is already there, no big issues, we just have to fail the transaction.

I think doing it as merge could make a lot of sense in the future. For now let's just say that one can only rollback from the current version. For now we're trying to replicate the wiki paradigm, which afaik does not have an ability to do a rollback at a specific point in time, only at the latest.

As for no user filtering, is there none at all? I'd like to see one be able to specify at least an FID, and rollback that. I don't want to have to rollback all features if someone messed one up 30 commits ago.

I did not explain myself properly. By user filtering, I mean "rollback
all commits done by user 'joe' in bbox(x1,y1 x2, y2)". I mean using
the user id as a filter component.

The other big use case I think is to be able to rollback based on a user, to take back everything that a vandal may have done.

Yeah, this one we don't support. I can add support for this thought,
it's a matter of telling which features where modified by the vandal,
and doing a fid filter on those. But it won't be exactly what you asked,
if other users touched the same features in between current revision
and the one that is the target of the rollback, all those changes will
be lost too. To perform the rollback, I do compare the feature state
at the current revision with the one at the rollback-to revision,
and restore the rollback-to version.
Filtering is there only to determine which features should I consider,
not which changes.

Past that more complex user filtering would be nice, to be able rollback a user in a certain bbox, or all roads that got wrongly marked as 'freeway' or some such. If doing that isn't a ton more than FID + user rollbacks then I'd say go for it, but those two should be sufficient for now.

You can do the most complex filters all right, just not filtering on
the user id. I'm going to add the latter soon.

These may be a good place to use some templating. I see this as similar to a GetFeatureInfo response, in that it's often just displayed right in the client, in however the server puts it out. Therefor we should allow control on the server side on how to format. So maybe make a basic text one and a basic html one, and then see what others come up with?

Sure, that was my plan in fact.

equation is not filled as well. I plan to add a VERSION attribute to the GetMap request, that would be a list of featureVersion specs.
This would allow for visual diffs and the like: just request two
times the same layer, at different revision, and eventually use filter
to gather exactly the features you want to look at.

This should probably hook in to the WMS 'Time' parameter as well, on versioned feature sets it could be a short cut in to that?

Ah, sure, good idea.

Cheers
Andrea