Jody Garnett ha scritto:
Hi Andrea,
...
You are clear on your scope (and yes everyone's hopes ask for more, but I respect your decision to start small).
Datastore Desgin:
Data table:
- I was not going to assume the revisionCreated - revisionExpired columns; instead used to a single "revision" column that only becomes non 0 when it is replaced (so revision==0 always represents the "live" data). Having two columns is not bad, does having both help you ask for data in a specific range? Or could we get by with just a single column.
I could get by with a single column but at high performance price.
Let's assume instead of 0 I do use maxint, since it simplifies queries (
if I used 0, the query below would need a special case to handle features that do not have a min for the subquery...)
In order to extract what was live at version x I would have
to query for record with the lowest revision number bigger than x,
something like:
select *
from data d1
where d1.revision = (
select min(revision)
from data d2
where d2.id = d1.id
and d2.revision > x
)
which is a lot slower than the single table access I propose.
I'm doing performance tests now, to see how much performance I give
up by using my schema, especially on extracting the last revision, which
is the most common operation anyways.
I'll keep you posted on the results.
GetFeature
- so out of the box it returns the latestI am a bit concerned that making the revision columns available messes up the origional schema (this simply will not work in the case where the schema is provided by a third party authority for example). Although this is not your use case (I recognize that) I am going to work through how it can be done:
Use getFeature with a vendor specific parameter describing the revision range.
The result of which is a GML document where the revision is either:
- part of the feature identifier
<Feature fid="people.fred.432456">..</Feature><Feature fid="people.wilma.432455">...</Feature>
- separate attribute on each feature (ie not element)
<Feature fid="people.fred" revision="432456">..</Feature><Feature fid="people.wilma" revision="432455">...</Feature> (preferred approach?)By making the concept of revision available as an attribute the normal describeFeatureType method can provide the correct description as part of the schema - ie the exact revision range that will work.
Hmm... I hear ya, yet there are downsides:
* I would no more be able to query the feature type for a specific revision using a plain GetFeature. This could be done in a GetFeatureVersioning extra method instead (something we are thinking about anyways to expand what we can ask a version based system), but
forces an API change in the version datastore as well (since what I'm
looking for is not in the gt2 filters anymore, unless we expand the
filter and expression sets to cope... hum, what would you do in this
regard?).
* the first approach would make it hard to build a checkout, how do
you know how to parse the revision out of the identifier?
* the second approach would require for a different GML producer, and
for a place in DefaultFeature to describe the revision (there's
none at the moment).
So, I'm really wondering, if the schema is mandated by an external authority, could we avoid messing with versioning and use the complex
data store instead to get the same result?
GetLog
- making it available as a normal feature is fine, collections support can be done if you need it.
Indeed, it's true... I just don't have any idea of how complex that would be and which GML producer we would need to use...
Transation:
- throwing errors out of Transaction is cool; consider any conflict to be the same as a locking conflict (ie the modification has been made by another so that feature is "locked")
- leave revision columns out of the describe feature type so that you do not have to worry about user's supplying the details...
See above, I would like to avoid that.
The Transaction "handle" is where your changelog message comes from. No additional extra attribute is needed from the Transaction element.
Did not thought about it, but this would be a way to bend the specification... The WFS 1.1, which is commented, says:
The handle attribute allows a client application
to assign a client-generated request identifier
to a WFS request. The handle is included to
facilitate error reporting. A WFS may report the
handle in an exception report to identify the
offending request or action. If the handle is not
present, then the WFS may employ other means to
localize the error (e.g. line numbers).
Forcing handle to be used as a commit message would be wrong in my opinion...
GetDiff - GetFeature option
Good summary; the allowing each attribute to be optional is tough. One way to consider this is the WFS1.1 idea of different formats; define a "diff" format that produces an xml document such as you describe; you can always include the modified XSD information incline as part of the document (since in this case the amount of data is small, and the modifications to the original XSD are known).
Yeah, indeed that's what I was thinking (but failed to communicate )
GetDiff - GetTransaction Request option
It would be *nice* if the result was *not* a GetFeatures extensions but instead the exact Transaction request documented required to make the change; no messing around or inventing new xml schema is requried here.This is consistent with Galdos cascading WFS-T approach and would be a *great* benefit for keeping servers in sync.
(Please consider this idea).Rollback - don't do it, use GetDiff
No comment on the SQL - as I am out of time. Except that after we prototype it would be smart to roll this stuff into the database side; although not as Paul suggests straight into PostGIS (smarter to do it as part of DataStore initialization, so the java code "owns" the SQL).
I have the same objection as with Paul's... this would turn into a
maintenance nightmare. I do really want this to be database independent
so that I don't have to fix bugs three times in three different database languages (that's why I'm using really plain queries too, they are
standard SQL supported by every database I know, besides maybe old
versions of Mysql that did not support sub-queries).
When a customer chooses a db, you don't have any way to make him change
his mind. See it with geoserver, people do complain about Oracle datastore, but this does not make them switch to Postgis, they simply
cannot. It's easier for them to drop Geoserver than Oracle.
So, if someone likes the idea and wants to do a Postgis integrated thing
that can be reused from PHP, python and whatnot clients,
then he should step out, do it, and maintain it too, because once you do
that, implementations do drift apart and each starts doing its own
(it's just a matter of time).
Oh, having a Postgres integrated vernsioning which would be great, btw, I would not be happy myself if postgis would haven been a middleware extension written in python.
Cheers
Andrea