[Geoserver-devel] A pull request to speed up GML encoding speed for simple features

Hi,
as you are probably aware, our gt-xsd encoding performance is not exactly
stellar, and lags quite a bit behind the old FeatureTranslator approach.

A few years ago Justin tried an approach to speed up simple feature encoding,
as that provides a predictable structure that we can leverage to avoid doing
lookups of bindings, schema walks, namespace setups, and so on.
That prototype got buried in the sands of time for a while, until a couple of weeks ago,
Justin managed to un-earth it and send it to me… I dusted it off, refreshed it
for GML 3.2, curves, tuples support, sped it up a bit more, and here we are :slight_smile:

The approach is based on a custom EncoderDelegate for simple feature
collections that, as said, leverages their flat and uniform structure to get a significant
performance boost:
https://github.com/geotools/geotools/pull/825

The approach uses a hierarchy of classes and custom little encoders to handle the different
GML version needs.

Feature wise, enabling it does not make us loose functionality, in particular, it has been tested
with:

  • Full GeoTools build
  • Full GeoServer build
  • WFS 1.1 cite tests
  • A few extra tests here and there

The code covers plain encoding, srsDimension flags, featureMembers/featureMember,
curves, and joined tuples.

The new encoders are enabled by a Configuration property, that GeoServer fills into the
configurations inside the GML output formats, enabling it by default, but adding
a system variable to turn it off, just in case:
https://github.com/geoserver/geoserver/pull/1020

Now, some benchmarks. I have done a couple of quick benchmarks for your reference.
The first one is against the usual “states” layer, with large-ish geometries, some attributes,
but a pretty small data set.

Reference WFS 1.0/FeatureTranslator output:

ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=topp:states

Requests per second: 326.82 [#/sec] (mean)

Using the optimized encoders:

ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=topp:states
Requests per second: 291.43 [#/sec] (mean)

ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=2.0&request=GetFeature&typeName=topp:states
Requests per second: 251.35 [#/sec] (mean)

Using standard gt-xsd:

ab -n 800 -c 8 “http://localhost:8081/geoserver/topp/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=topp:states
Requests per second: 111.84 [#/sec] (mean)

ab -n 800 -c 8 “http://localhost:8081/geoserver/topp/ows?service=WFS&version=2.0&request=GetFeature&typeName=topp:states
Requests per second: 92.93 [#/sec] (mean)

As you can see, we get an almost 3 times speedup compared to the current gt-xsd approach. Not bad I’d say.

A second dataset I’ve checked is the Natural Earth quickstart pack, over 7000 point features, with a lot of attributes.
Here are the results:

Reference WFS 1.0/FeatureTranslator output:

ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 7.60 [#/sec] (mean)

Using the optimized encoders:

ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 6.29 [#/sec] (mean)

ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=2.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 5.96 [#/sec] (mean)

Using standard gt-xsd:

ab -n 200 -c 8 “http://localhost:8081/geoserver/nurc/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 0.83 [#/sec] (mean)

ab -n 200 -c 8 “http://localhost:8081/geoserver/nurc/ows?service=WFS&version=2.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 0.87 [#/sec] (mean)

So, much more interesting speedup for this one, around 7 times

In both cases the speed of the GML3 optimizer encoders comes close to the GML2 one, but it’s still not a match… and I don’t exactly know where the
extra time is spent (the profiles results left me wondering).
If you look at the code, you can see we also have GML2 optimized output for gt-xsd, but with the above results, I’m not too eager to propose
dropping FeatureTranslator yet.

I also don’t exactly know why the GML3.2 output is slightly slower than the GML3 one.

Ah, commit wise, I don’t foresee a backport to the stable series, although the changes are mostly additions, so we could
think of doing a backport, leaving it disabled by default, but allowing interested people to enable it by flipping a system variable.

Anyways… feedback welcomed

Cheers
Andrea

PS: some might be wondering, could’nt we speed up the GML encoder in general instead?
The answer is yes, but we would not have got nowhere near the FeatureTranslator performance.
I’ve tried a bit anyways, and after quite a bit of effort, got a measly 30% speedup and a lot of broken tests…

One cannot beat “not doing things” with a simple “do some things somewhat faster”

···

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Dived into looking at the pull request … one question for the email list:

The code uses PicoContainer - isn’tt that is rather dead by this time - is pico container still earning its keep here?

The project movie to github (https://github.com/picocontainer/picocontainer) although a lot is still on codehaus.

···

On 21 April 2015 at 10:32, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
as you are probably aware, our gt-xsd encoding performance is not exactly
stellar, and lags quite a bit behind the old FeatureTranslator approach.

A few years ago Justin tried an approach to speed up simple feature encoding,
as that provides a predictable structure that we can leverage to avoid doing
lookups of bindings, schema walks, namespace setups, and so on.
That prototype got buried in the sands of time for a while, until a couple of weeks ago,
Justin managed to un-earth it and send it to me… I dusted it off, refreshed it
for GML 3.2, curves, tuples support, sped it up a bit more, and here we are :slight_smile:

The approach is based on a custom EncoderDelegate for simple feature
collections that, as said, leverages their flat and uniform structure to get a significant
performance boost:
https://github.com/geotools/geotools/pull/825

The approach uses a hierarchy of classes and custom little encoders to handle the different
GML version needs.

Feature wise, enabling it does not make us loose functionality, in particular, it has been tested
with:

  • Full GeoTools build
  • Full GeoServer build
  • WFS 1.1 cite tests
  • A few extra tests here and there

The code covers plain encoding, srsDimension flags, featureMembers/featureMember,
curves, and joined tuples.

The new encoders are enabled by a Configuration property, that GeoServer fills into the
configurations inside the GML output formats, enabling it by default, but adding
a system variable to turn it off, just in case:
https://github.com/geoserver/geoserver/pull/1020

Now, some benchmarks. I have done a couple of quick benchmarks for your reference.
The first one is against the usual “states” layer, with large-ish geometries, some attributes,
but a pretty small data set.

Reference WFS 1.0/FeatureTranslator output:

ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=topp:states

Requests per second: 326.82 [#/sec] (mean)

Using the optimized encoders:

ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=topp:states
Requests per second: 291.43 [#/sec] (mean)

ab -n 800 -c 8 “http://localhost:8080/geoserver/topp/ows?service=WFS&version=2.0&request=GetFeature&typeName=topp:states
Requests per second: 251.35 [#/sec] (mean)

Using standard gt-xsd:

ab -n 800 -c 8 “http://localhost:8081/geoserver/topp/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=topp:states
Requests per second: 111.84 [#/sec] (mean)

ab -n 800 -c 8 “http://localhost:8081/geoserver/topp/ows?service=WFS&version=2.0&request=GetFeature&typeName=topp:states
Requests per second: 92.93 [#/sec] (mean)

As you can see, we get an almost 3 times speedup compared to the current gt-xsd approach. Not bad I’d say.

A second dataset I’ve checked is the Natural Earth quickstart pack, over 7000 point features, with a lot of attributes.
Here are the results:

Reference WFS 1.0/FeatureTranslator output:

ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 7.60 [#/sec] (mean)

Using the optimized encoders:

ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 6.29 [#/sec] (mean)

ab -n 200 -c 8 “http://localhost:8080/geoserver/nurc/ows?service=WFS&version=2.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 5.96 [#/sec] (mean)

Using standard gt-xsd:

ab -n 200 -c 8 “http://localhost:8081/geoserver/nurc/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 0.83 [#/sec] (mean)

ab -n 200 -c 8 “http://localhost:8081/geoserver/nurc/ows?service=WFS&version=2.0&request=GetFeature&typeName=nurc:ne_10m_populated_places
Requests per second: 0.87 [#/sec] (mean)

So, much more interesting speedup for this one, around 7 times

In both cases the speed of the GML3 optimizer encoders comes close to the GML2 one, but it’s still not a match… and I don’t exactly know where the
extra time is spent (the profiles results left me wondering).
If you look at the code, you can see we also have GML2 optimized output for gt-xsd, but with the above results, I’m not too eager to propose
dropping FeatureTranslator yet.

I also don’t exactly know why the GML3.2 output is slightly slower than the GML3 one.

Ah, commit wise, I don’t foresee a backport to the stable series, although the changes are mostly additions, so we could
think of doing a backport, leaving it disabled by default, but allowing interested people to enable it by flipping a system variable.

Anyways… feedback welcomed

Cheers
Andrea

PS: some might be wondering, could’nt we speed up the GML encoder in general instead?
The answer is yes, but we would not have got nowhere near the FeatureTranslator performance.
I’ve tried a bit anyways, and after quite a bit of effort, got a measly 30% speedup and a lot of broken tests…

One cannot beat “not doing things” with a simple “do some things somewhat faster”

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.



BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


GeoTools-Devel mailing list
GeoTools-Devel@anonymised.coms.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel


Jody Garnett

On Tue, Apr 21, 2015 at 11:00 PM, Jody Garnett <jody.garnett@anonymised.com>
wrote:

Dived into looking at the pull request ... one question for the email list:

The code uses PicoContainer - isn'tt that is rather dead by this time -
is pico container still earning its keep here?

The whole binding architecture is based on constructor injection, and afaik
Pico provides that with minimal
dependencies and a gear towards programmatic setup.

I don't have sponsorship to rip out Pico and change it with something else
but, assuming that
one had, what would you suggest as a replacement?
Spring, gut feeling at least, would seem heavier.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

No problem, just trying to check if the project is alive.

···

On Tue, Apr 21, 2015 at 11:00 PM, Jody Garnett <jody.garnett@anonymised.com> wrote:

Dived into looking at the pull request … one question for the email list:

The code uses PicoContainer - isn’tt that is rather dead by this time - is pico container still earning its keep here?

The whole binding architecture is based on constructor injection, and afaik Pico provides that with minimal
dependencies and a gear towards programmatic setup.

I don’t have sponsorship to rip out Pico and change it with something else but, assuming that
one had, what would you suggest as a replacement?
Spring, gut feeling at least, would seem heavier.

Cheers

Andrea

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.