[Geoserver-devel] improving wfs performance

Hi all,

Warning this is a long email, and parts of it quite get involved in the implementation details of the GeoServer wfs. But nonetheless here I go.

Recently I have been putting effort into improving WFS performance. Unfortunately the changes I have been making are quite substantial and won't be suitable for a stable branch. If you are curious about checking it out I have a branch called "appschema_cache" in my geoserver git repository:

http://github.com/jdeolive/geoserver/

As the branch name hints the work started out as coming up with a way to simply cache application schemas as they are built. My original plan to use a cache to work around the slow memory leak for describe feature type requests.

http://jira.codehaus.org/browse/GEOS-3534

To back up for a moment. When I refer to "application schemas" I do not refer to appschema/complex features. What I mean is the schema that is built from the geoserver feature types in the catalog. Such schemas are built when:

* responding a DescribeFeatureType request
* encoding a GML3 output (since it uses the encoder which is "schema assisted")

Now the approach we take to building the schemas is the following. We take the core wfs schema and them modify it. Adding new types and element declarations for all the feature types in the GeoServer catlaog. Now you can imagine this is inefficient for a number of reasons:

a) It modifies the wfs schema (which is big) every time

b) It scans the entire catalog every time (which is expensive and hinders security)

c) It is seriously non thread safe

(c) requires a bit more of an explanation. the hacking of the wfs schema to add types and elements goes on in the WFSConfiguration class which by design is not meant to be thread safe. However to get around rebuilding the schema for every single request the configuration object is cached as a singleton. But again modifies internal state upon a request in a non thread safe way.

So, that said how do we go about fixing. Well the approach is to instead of modifying the wfs schema, leave it be and simply import it. Doing this has the benefit of allowing the wfs schema to be built once, and cached for its life time.

And if you think about it this makes sence when talking about application schemas. When developing an application schema you do so for your target namespace and import wfs. You don't copy WFS and modify it adding your types.

That is the first part. The second part is to only build a schema object for the feature types that are being requested. This alleviates the problem of having to scan the entire catalog when responding to a request.

The third part is to move the building of the schema (from GeoServer feature types) out of a Configuration class, and into an XSD class. This fixes the concurrency problem since the XSD gets build once, and the Configuration is instantiated multiple times which is the way the parser and encoder are designed to work.

So all that said I have implemented the above improvements and have indeed seen improvements. In both speed and the elimination of the slow memory leak. I have just started doing official benchmarking so I will have some comparative numbers of current trunk vs the appschema_cache branch soon.

Also of note is that this work lays all the ground work to finally bring home the optimized gml encoder (which I experimented with about a year ago) home cleanly. Which means gml3 output and gml2 schema assisted output for simple features will perform close to as good (within 5%) as the old transformer based gml2 encoder.

So all that is great right. Well there is a problem. And it comes up with the appschema (as in complex features) extension. As far as I can tell the feature chaining functionality implemented in appschema relies on the fact that every time a schema is built for encoding it includes every feature type in the GeoServer catalog. I am looking into ways to fix this but will have to recruit the help of Ben and the experts. And since this email is long enough I will do so in a different thread :slight_smile:

Thanks for reading if you did in fact get this far :slight_smile:

-Justin

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

Hi all,

Warning this is a long email, and parts of it quite get involved in the implementation details of the GeoServer wfs. But nonetheless here I go.

Recently I have been putting effort into improving WFS performance. Unfortunately the changes I have been making are quite substantial and won't be suitable for a stable branch. If you are curious about checking it out I have a branch called "appschema_cache" in my geoserver git repository:

http://github.com/jdeolive/geoserver/

I cannot comment intelligently on the community schema issues, but
yay! for killing the memory leak and improving the GML encoding
speed.

Looking forward to see that land on trunk

cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.