Hi,
as you are probably aware, our gt-xsd encoding performance is not exactly
stellar, and lags quite a bit behind the old FeatureTranslator approach.
A few years ago Justin tried an approach to speed up simple feature encoding,
as that provides a predictable structure that we can leverage to avoid doing
lookups of bindings, schema walks, namespace setups, and so on.
That prototype got buried in the sands of time for a while, until a couple of weeks ago,
Justin managed to un-earth it and send it to me… I dusted it off, refreshed it
for GML 3.2, curves, tuples support, sped it up a bit more, and here we are
The approach is based on a custom EncoderDelegate for simple feature
collections that, as said, leverages their flat and uniform structure to get a significant
performance boost:
https://github.com/geotools/geotools/pull/825
The approach uses a hierarchy of classes and custom little encoders to handle the different
GML version needs.
Feature wise, enabling it does not make us loose functionality, in particular, it has been tested
with:
- Full GeoTools build
- Full GeoServer build
- WFS 1.1 cite tests
- A few extra tests here and there
The code covers plain encoding, srsDimension flags, featureMembers/featureMember,
curves, and joined tuples.
The new encoders are enabled by a Configuration property, that GeoServer fills into the
configurations inside the GML output formats, enabling it by default, but adding
a system variable to turn it off, just in case:
https://github.com/geoserver/geoserver/pull/1020
Now, some benchmarks. I have done a couple of quick benchmarks for your reference.
The first one is against the usual “states” layer, with large-ish geometries, some attributes,
but a pretty small data set.
Reference WFS 1.0/FeatureTranslator output:
ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=topp:states”
Requests per second: 326.82 [#/sec] (mean)
Using the optimized encoders:
ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=topp:states”
Requests per second: 291.43 [#/sec] (mean)
ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=2.0&request=GetFeature&typeName=topp:states”
Requests per second: 251.35 [#/sec] (mean)
Using standard gt-xsd:
ab -n 800 -c 8 “http://localhost:8081/geoserver/topp/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=topp:states”
Requests per second: 111.84 [#/sec] (mean)
ab -n 800 -c 8 “http://localhost:8081/geoserver/topp/ows?service=WFS&version=2.0&request=GetFeature&typeName=topp:states”
Requests per second: 92.93 [#/sec] (mean)
As you can see, we get an almost 3 times speedup compared to the current gt-xsd approach. Not bad I’d say.
A second dataset I’ve checked is the Natural Earth quickstart pack, over 7000 point features, with a lot of attributes.
Here are the results:
Reference WFS 1.0/FeatureTranslator output:
ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=nurc:ne_10m_populated_places”
Requests per second: 7.60 [#/sec] (mean)
Using the optimized encoders:
ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=nurc:ne_10m_populated_places”
Requests per second: 6.29 [#/sec] (mean)
ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=2.0&request=GetFeature&typeName=nurc:ne_10m_populated_places”
Requests per second: 5.96 [#/sec] (mean)
Using standard gt-xsd:
ab -n 200 -c 8 “http://localhost:8081/geoserver/nurc/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=nurc:ne_10m_populated_places”
Requests per second: 0.83 [#/sec] (mean)
ab -n 200 -c 8 “http://localhost:8081/geoserver/nurc/ows?service=WFS&version=2.0&request=GetFeature&typeName=nurc:ne_10m_populated_places”
Requests per second: 0.87 [#/sec] (mean)
So, much more interesting speedup for this one, around 7 times
In both cases the speed of the GML3 optimizer encoders comes close to the GML2 one, but it’s still not a match… and I don’t exactly know where the
extra time is spent (the profiles results left me wondering).
If you look at the code, you can see we also have GML2 optimized output for gt-xsd, but with the above results, I’m not too eager to propose
dropping FeatureTranslator yet.
I also don’t exactly know why the GML3.2 output is slightly slower than the GML3 one.
Ah, commit wise, I don’t foresee a backport to the stable series, although the changes are mostly additions, so we could
think of doing a backport, leaving it disabled by default, but allowing interested people to enable it by flipping a system variable.
Anyways… feedback welcomed
Cheers
Andrea
PS: some might be wondering, could’nt we speed up the GML encoder in general instead?
The answer is yes, but we would not have got nowhere near the FeatureTranslator performance.
I’ve tried a bit anyways, and after quite a bit of effort, got a measly 30% speedup and a lot of broken tests…
One cannot beat “not doing things” with a simple “do some things somewhat faster”
···
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.
==
Ing. Andrea Aime
@geowolf
Technical Lead
GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549
http://www.geo-solutions.it
http://twitter.com/geosolutions_it
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.
The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.