Andrea Aime wrote:
Justin Deoliveira ha scritto:
Hi all,
For 2.0-alpha2 I have been running cite tests like for normal releases, however I used this as an opportunity to resurrect some stuff I was working on a while back.
To step back a moment, my motivation was to be able to run cite tests against multiple backends easily, targeting mainly db back ends, postgis, h2, mysql, etc...
So the first thing I did was come up with a small utility program to load an arbitrary datastore with the various configurations of cite data. Currently I have only implemented wfs 1.0 and wfs 1.1. The loader can be found underneath the rest of the cite utilities i have worked on in the past:
http://svn.codehaus.org/geoserver/trunk/src/community/cite/loader/
The nice thing about this is that it exercises really what goes *into the datastore* strictly through the geotools api, rather than hacking the backend with sql scripts and making this appear ok coming *out of the datastore*.
Nice, I love this. DataStore.createSchema and data loading have never
been tested much. I expect those to have some issues (e.g., creating
the wrong column types, not setting the expected length restrictions
and the like).
So with the loader script I loaded up an H2 database for cite wfs 1.0 cite tests and did a run. And... it uncovered quite a few of the hacks that the current postgis setup uses to be "cite compliant". To name the major ones:
1) the boundedBy attribute only gets encoded because it is an attribute in the underlying tables. Which obviously would never get updated if the actual geometry changed.
Mumble muble... isn't this related to the fact we don't have feature
bounding on by default anymore? If you enable it the gml:boundedBy element will be generated, see for example here:
http://geo.openplans.org:8080/geoserver/wfs?service=WFS&version=1.0.0&request=GetFeature&typeName=topp:states
Perhaps... the boundedBy attribute may be unnecessary in the database and this is just there for legacy reasons.
Hmm... but if you have it inside the table, won't you get it inside
another namespace (the one of the feature)?
2) The reference to "built-in" gml geometry property types in GML2 is toggled via a flag to the gml2 encoder, rather than mapping feature attributes to the actual application schema and gml schemas like we do for GML3.
Can you explain why the GML3 way is any better? Afaik GML3 is not doing
real "app schema" anyways?
Well it is in the fact that the encoder respects the schema. And this allows for very basic mapping facilities. Namely being able to specify the namespace attached to the element and being able to specify the encoded type of an element, etc... The old transformer can't do that.
The flag is a brute force approach that would work only in the general case.
* Will it work in the case you have two geometries and one needs to reference a gml property type, and the other an app schema property type -> no, both are either gml, or both our either app schema
* Will it work if the property name is named something other than the well known property types, poingPropertyType,lineStringPropertyType, etc... -> no
And there are other issues. I finally figured out why wfs 1.0 describe feature types never pass with the new engine. It is because the schema generated by wfs 1.0 does not match the schemas built into the cite tests. I am not sure how this ever passed on the old engine but I had to dig into into the XSLT pit of the new engine and indeed found the schemas that it uses to validate.
Didn't we use featureTypes/typeName/schema.xml in order to
match the expected feature type?
For one type... and after I examined it with respect to the schemas used by the cite tests they still did not match up, so i admit i am unsure how this worked.
Here is a summary of what I did codewise:
1) Added a new GML2 output format (GML2OutputFormat2) which uses the gtxml encoder like the GML3 one does. (I can see Andreas eyes rolling from here). I did try to make it work with the existing encoder but could not since it does not at all respect the application schema being encoded against.
My eyes will be rolling only as long as the new encoder is not up
to snuff speed wise, the day we can recommend it for production I'll
be happy with it.
Having a separate gml2 encoder using the new architecture could be a good intermediate step to get it some exposure in the meantime, but
I'm not sure we want to run cite tests with it: we'd end up running
cite tests with one encoder but then suggesting people to use the
other one in production.
Well that is kind of a mute argument since many of the code paths we run during cite are run only during cite and not in production. Although I admit using a full blown different encoder is drastic it does not seem any different. I mean without it we are not cite compliant. Just like without all the other cite hacks we are not cite compliant.
2) Added proper schema.xsd files for each feature type so that geoserver actually creates the feature types properly for the cite wfs 1.0 cite data
Seems more workaround than we had before? If you look here:
http://svn.codehaus.org/geoserver/branches/1.7.x/data/citewfs-1.0/featureTypes/
you'll notice that only one of the feature types is using a schema
override:
http://svn.codehaus.org/geoserver/branches/1.7.x/data/citewfs-1.0/featureTypes/cdf_Other/
Not quite, all the types have special schema requirements that we were not really supporting. Search for dataFeatures.xsd and geometryFeatures.xsd under the cite/tests directory to see what I mean.
3) Refactored a bit the wfs 1.1 schema encoder to work with both 1.1 and 1.0.
4) Some other random changes here and there... mostly bug fixes to work against the new configuration.
And it works!! I was able to run wfs 1.0 cite tests on both H2 and MySQL (NG). It should work for pretty much all NG datastores but I have not gotten around to trying them all. Although I know there is something special with Oracle (go figure) that prevent us to pass cite. Same goes for DB2 I believe.
For Oracle we cannot pass the cite tests for a couple reasons:
- table name and geometry name are forced to be uppercase by Oracle
Spatial, cite tests want different case
- there is no boolean data type
Hmmm... there may be some issue related to number handling as well.
In any case, it seems we need a mapping datastore in the middle in
order to cite test with Oracle. (or a purely renaming one if we
establish some convention on what a "boolean" is in Oracle).
Similar situation for wfs 1.1 tests.
I have committed all my changes that are "non-disruptive" and wanted to bounce my ideas off for the important changes.
1) GML2OutputFormat2: I realize that there are performance issues with the the gtxml encoder (which btw i am working on, but that is a discussion for another thread). So I am not proposing a replacement. What I am proposing is that the GML2OutputFormat be engaged when strict cite compliance is set.
I would prefer to see the production choice be used for cite testing as well. Can you point me at what issues there are with the old gml2
encoder? I've had good success fixing it in the past.
What about an environment variable telling the encoder which one to use?
This way one can use GML2OutputFormat2 if he wants so.
Ha, try to run wfs cite tests with a regular database setup and have fun. It took me a couple weeks of spare time to figure out all the issues and fix them cleanly so good luck.
The alternative is to not change anything and keep the old postgis db around with the old encoder and pass the tests for that special case. In which calling ourselves cite compliant would be a stretch.
The whole point for me in this exercises was not to test our WFS protocol, we have already done that, it is to test our backend datastores against the variety of cases that the cite tests throw out.
Anyways, I am curious if other people think the value add here is worth the hit in performance. My opinion is I have never seen GML as a format built for speed, it is way too verbose, it requires the loading of an external document to describe itself, etc... I am also curious to know if anyone has actually chosen server software based soley on how fast it spits out GML.
2) XmlSchemaEncoder: I am proposing replacing the old 1.0 schema encoder with the new one. The old one has no notion of schema overrides, and quite brutishly builds up a big string buffer and then spits out the XML.
Yes, works for me.
Cheers
Andrea
--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.