[Geoserver-users] WFS 1.0, WFS 1.1 and WFS 2.0 performance issue

Hello List,

we've discovered a performance difference between WFS 1.0, WFS 1.1 and WFS 2.0 bbox queries.
Thanks in advance for your hints.

Kind regards,
Juergen Weichand

##################
Config:
##################
PostGIS DB which contains ~10.000.000 Multi-Polygons (Parcels)
BBOX Query which returns 379.023 (simple) Features
GeoServer 2.2.4 (also 2.3 b1)

##################
Queries:
##################
WFS 1.0
http://geoserv.weichand.de/tmp/GetFeature_10_BBOX.xml

WFS 1.1
http://geoserv.weichand.de/tmp/GetFeature_11_BBOX.xml

WFS 2.0
http://geoserv.weichand.de/tmp/GetFeature_20_BBOX.xml

##################
Result:
##################
(time to 1st response --> without downloading the stream)

WFS 1.0
------> HTTP-Status: 200
------> Content-Type: text/xml; subtype=gml/2.1.2
------> Time to 1st response: 662 ms

WFS 1.1
------> HTTP-Status: 200
------> Content-Type: text/xml; subtype=gml/3.1.1
------> Time to 1st response: 45337 ms

WFS 2.0
------> HTTP-Status: 200
------> Content-Type: text/xml; subtype=gml/3.2
------> Time to 1st response: 106447 ms

Logs:
http://geoserv.weichand.de/tmp/log.txt

On Fri, Feb 15, 2013 at 11:03 AM, Jürgen Weichand <juergen.weichand@anonymised.com> wrote:

Hello List,

we’ve discovered a performance difference between WFS 1.0, WFS 1.1 and
WFS 2.0 bbox queries.

A good 2 times performance difference is expected since the GML2 encoder is
way more efficient than the GML 3.x ones, but what you’re observing is not.
Did you just measure the first request? That one has to do complex setups
which are then cached, subsequent ones ones should be significantly
faster

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Hi Andrea,

it’s a simple feature setup. I also didn’t measure the first request.

Example:

n > 1

Request n + 0 => WFS 1.0 < 1s
Request n + 1 => WFS 1.0 < 1s

Request n + 2 => WFS 1.1 40 - 45 s
Request n + 3 => WFS 1.1 40 - 45 s

Request n + 4 => WFS 2.0 95 - 105 s
Request n + 5 => WFS 2.0 95 - 105 s

Request n + 6 => WFS 1.0 < 1s

Best regards
Juergen

Am 15.02.2013 14:27, schrieb Andrea Aime:

On Fri, Feb 15, 2013 at 11:03 AM, Jürgen Weichand <juergen.weichand@anonymised.com> wrote:

Hello List,

we’ve discovered a performance difference between WFS 1.0, WFS 1.1 and
WFS 2.0 bbox queries.

A good 2 times performance difference is expected since the GML2 encoder is
way more efficient than the GML 3.x ones, but what you’re observing is not.
Did you just measure the first request? That one has to do complex setups
which are then cached, subsequent ones ones should be significantly
faster

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


More detailed log file (with sql-statements):
http://geoserv.weichand.de/tmp/sqllog.txt

Runtime of the count-statement(s) on huge tables/views (~10.000.000 features)?
Needed for paging?

On Fri, Feb 15, 2013 at 3:49 PM, Jürgen Weichand <juergen.weichand@anonymised.com> wrote:

More detailed log file (with sql-statements):
http://geoserv.weichand.de/tmp/sqllog.txt

Runtime of the count-statement(s) on huge tables/views (~10.000.000 features)?
Needed for paging?

Yes, it’s needed for paging.
However the attribute is optional, we could avoid returning it and thus computing it,
we just lack the configuration to do so (would be similar to the feature bounds one,
which I guess you already disabled, as I don’t see a bounds query in your SQL logs).

I’ve tried to get a local setup to reproduce some of your results.
To that end, I’ve downloaded the buildings layers for a good portion of Germany
from Geofabrik (http://download.geofabrik.de/openstreetmap/europe/germany/)
and loaded them in PostGIS, getting a buildings layer with around 8 million features.

These layers are in 4326, so I’ve reprojected your bounding boxes in 4326, getting
11.386153034926789,47.913610526723836
11.852600617836472,48.30077117627526
(lon/lat order), which applied to the data gives me back 170020 results, smaller
than yours, but in the same order of magnitude: the size of the returned GML
varies according to the version, but it’s around the 100MB.

I’ve run the requests in XML you provided, adapting them to my typename and
coordinate system, using curl with a template that makes it return the
connection time, the time at which the first result is returned, and finally the
total time:

curl -s -o /dev/null -d @GetFeature_10_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_starttransfer}:%{time_total}” “http://localhost:8080/geoserver/wfs
0,001:0,086:4,411

curl -s -o /dev/null -d @GetFeature_11_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_time_total}” “http://localhost:8080/geoserver/wfs
0,000:0,211:45,496

curl -s -o /dev/null -d @GetFeature_20_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_time_total}” “http://localhost:8080/geoserver/wfs
0,001:0,002:59,891

While the time to return the full GML file varies a lot (the GML 3 encoders are quite inefficient),
the time to get the first result is negligible in all three cases.
And this is with the default settings, I haven’t even disabled the “feature bounding” option,
so the above also includes the time it takes to run the select for the bounds of the returned
feature collection.

Ah, I’ve run the tests with the GeoServer 2.3.x series (the 2.2.x one is coming to an end,
probably next week, next month the latest).

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


On Sun, Feb 17, 2013 at 6:03 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

curl -s -o /dev/null -d @GetFeature_10_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_starttransfer}:%{time_total}” “http://localhost:8080/geoserver/wfs

0,001:0,086:4,411

curl -s -o /dev/null -d @GetFeature_11_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_time_total}” “http://localhost:8080/geoserver/wfs
0,000:0,211:45,496

curl -s -o /dev/null -d @GetFeature_20_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_time_total}” “http://localhost:8080/geoserver/wfs
0,001:0,002:59,891

Humm, there has been a copy/paste error above, the curl commands all contained the time_starttransfer, as can be inferred
from the fact there are three results in the output

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Hi Andrea,

thank you for your detailed response.

curl -s -o /dev/null -d @GetFeature_10_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_starttransfer}:%{time_total}” “http://some-server.de:8180/geoserver/wfs”
0,010:0,369:119,321

curl -s -o /dev/null -d @GetFeature_11_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_starttransfer}:%{time_total}” “http://some-server.de:8180/geoserver/wfs”
0,009:28,597:246,712

curl -s -o /dev/null -d @GetFeature_20_BBOX.xml -XPOST -H ‘Content-type: text/xml’ -w “%{time_connect}:%{time_starttransfer}:%{time_total}” “http://some-server.de:8180/geoserver/wfs”
0,020:101,138:333,484

The count-statements take a long time to be completed. I’ve already run VACUUM and renewed the index.

Please see below
numberMatched can be ‘unknown’, so in case if the actual number of matched features is smaller than the ‘count’ parameter, or the count parameter is not set, what should the value of ‘numberReturned’ be? Shouldn’t it be possible to set this attribute to ‘unknown’?
https://portal.opengeospatial.org/files?artifact_id=43925

Best regards
Juergen

On Tue, Feb 19, 2013 at 5:19 PM, Jürgen Weichand <juergen.weichand@anonymised.com> wrote:

Please see below
numberMatched can be ‘unknown’, so in case if the actual number of matched features is smaller than the ‘count’ parameter, or the count parameter is not set, what should the value of ‘numberReturned’ be? Shouldn’t it be possible to set this attribute to ‘unknown’?
https://portal.opengeospatial.org/files?artifact_id=43925

We don’t have a way to configure GeoServer that way, but if someone wants to add a configuration option to
disable feature counts, just like we already have an option to disable the bounds generation, I’m sure it
would be welcomed by the GeoServer developers

Cheers
Andrea

==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it