[Geoserver-devel] Performance versus conformance in WFS 2.0 paging

I have encountered a decision point while fixing a bug in WFS 2.0 paging:
https://jira.codehaus.org/browse/GEOS-5085

WFS 2.0 paging is implemented by specifying startindex and count (like maxFeatures in 1.1.0) in a GetFeature request. Our implementation uses the presence of startindex to detect whether paging is in use; to ensure consistency across pages, results must be sorted when paging is in use. However, this has one undesirable implication: our use of startindex is at odds with the WFS 2.0 spec, which specifies that startindex defaults to zero.

I see two options:

Option 1: Performance
- the presence of startindex triggers sorting for paging consistency
- the absence of startindex means that responses can be unsorted for greater performance
- startindex=0 and the absence of startindex are treated differently
- clients that omit startindex for their first page of paged results will get inconsistent pages (was are *assuming* that all paging clients set startindex=0 for their first page, despite this being explicitly the default in the spec)
- we will have a surprising nudge-nudge-wink-wink interpretation of the WFS 2.0 spec that differs from the tabulated default value of startindex

Option 2: Conformance
- startindex=0 has exactly the same effect as startindex not being specified
- all WFS 2.0 responses will be sorted, at the cost of performance
- we are conformant with the default values specified in the WFS 2.0 spec

So, in a nutshell, should all WFS 2.0 responses be sorted?

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

Ciao Ben, my 2 cents,
IMHO standards are beatiful as long as they are useful (i.e they make like simple or even just simpler). If by supporting a standard strictly we become unnecessarily slow, then we are less useful to users therefore the standard is failing.

This does not mean that we should ignore rules mandated by the standards.My usual suggestion in this kind of cases is to put a flag somewhere in the config to switch between strict/non strict adeherence and go for the non strict by default.

Regards,
Simone Giannecchini

Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/simonegiannecchini
http://twitter.com/simogeo


On Wed, May 16, 2012 at 6:16 AM, Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com> wrote:

I have encountered a decision point while fixing a bug in WFS 2.0 paging:
https://jira.codehaus.org/browse/GEOS-5085

WFS 2.0 paging is implemented by specifying startindex and count (like
maxFeatures in 1.1.0) in a GetFeature request. Our implementation uses
the presence of startindex to detect whether paging is in use; to ensure
consistency across pages, results must be sorted when paging is in use.
However, this has one undesirable implication: our use of startindex is
at odds with the WFS 2.0 spec, which specifies that startindex defaults
to zero.

I see two options:

Option 1: Performance

  • the presence of startindex triggers sorting for paging consistency
  • the absence of startindex means that responses can be unsorted for
    greater performance
  • startindex=0 and the absence of startindex are treated differently
  • clients that omit startindex for their first page of paged results
    will get inconsistent pages (was are assuming that all paging clients
    set startindex=0 for their first page, despite this being explicitly the
    default in the spec)
  • we will have a surprising nudge-nudge-wink-wink interpretation of the
    WFS 2.0 spec that differs from the tabulated default value of startindex

Option 2: Conformance

  • startindex=0 has exactly the same effect as startindex not being specified
  • all WFS 2.0 responses will be sorted, at the cost of performance
  • we are conformant with the default values specified in the WFS 2.0 spec

So, in a nutshell, should all WFS 2.0 responses be sorted?

Kind regards,


Ben Caradoc-Davies Ben.Caradoc-Davies@anonymised.com
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Thanks, Simone. I agree with you in principle, but there are two issues that concern me:
(1) cluttering up the configuration with options that make it more complicated
(2) adding unexpected surprised for clients, who in any case cannot know how a GeoServer instance is configured.

Kind regards,
Ben.

On 16/05/12 13:57, Simone Giannecchini wrote:

Ciao Ben, my 2 cents,
IMHO standards are beatiful as long as they are useful (i.e they make like simple or even just simpler). If by supporting a standard strictly we become unnecessarily slow, then we are less useful to users therefore the standard is failing.

This does not mean that we should ignore rules mandated by the standards.My usual suggestion in this kind of cases is to put a flag somewhere in the config to switch between strict/non strict adeherence and go for the non strict by default.

Regards,
Simone Giannecchini
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/simonegiannecchini
http://twitter.com/simogeo

-------------------------------------------------------

On Wed, May 16, 2012 at 6:16 AM, Ben Caradoc-Davies<Ben.Caradoc-Davies@anonymised.com<mailto:Ben.Caradoc-Davies@anonymised.com>> wrote:
I have encountered a decision point while fixing a bug in WFS 2.0 paging:
https://jira.codehaus.org/browse/GEOS-5085

WFS 2.0 paging is implemented by specifying startindex and count (like
maxFeatures in 1.1.0) in a GetFeature request. Our implementation uses
the presence of startindex to detect whether paging is in use; to ensure
consistency across pages, results must be sorted when paging is in use.
However, this has one undesirable implication: our use of startindex is
at odds with the WFS 2.0 spec, which specifies that startindex defaults
to zero.

I see two options:

Option 1: Performance
- the presence of startindex triggers sorting for paging consistency
- the absence of startindex means that responses can be unsorted for
greater performance
- startindex=0 and the absence of startindex are treated differently
- clients that omit startindex for their first page of paged results
will get inconsistent pages (was are *assuming* that all paging clients
set startindex=0 for their first page, despite this being explicitly the
default in the spec)
- we will have a surprising nudge-nudge-wink-wink interpretation of the
WFS 2.0 spec that differs from the tabulated default value of startindex

Option 2: Conformance
- startindex=0 has exactly the same effect as startindex not being specified
- all WFS 2.0 responses will be sorted, at the cost of performance
- we are conformant with the default values specified in the WFS 2.0 spec

So, in a nutshell, should all WFS 2.0 responses be sorted?

Kind regards,

--
Ben Caradoc-Davies<Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net<mailto:Geoserver-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

On Wed, May 16, 2012 at 6:16 AM, Ben Caradoc-Davies
<Ben.Caradoc-Davies@anonymised.com> wrote:

I have encountered a decision point while fixing a bug in WFS 2.0 paging:
https://jira.codehaus.org/browse/GEOS-5085

WFS 2.0 paging is implemented by specifying startindex and count (like
maxFeatures in 1.1.0) in a GetFeature request. Our implementation uses the
presence of startindex to detect whether paging is in use; to ensure
consistency across pages, results must be sorted when paging is in use.
However, this has one undesirable implication: our use of startindex is at
odds with the WFS 2.0 spec, which specifies that startindex defaults to
zero.

I see two options:

Option 1: Performance
- the presence of startindex triggers sorting for paging consistency
- the absence of startindex means that responses can be unsorted for greater
performance
- startindex=0 and the absence of startindex are treated differently
- clients that omit startindex for their first page of paged results will
get inconsistent pages (was are *assuming* that all paging clients set
startindex=0 for their first page, despite this being explicitly the default
in the spec)
- we will have a surprising nudge-nudge-wink-wink interpretation of the WFS
2.0 spec that differs from the tabulated default value of startindex

Option 2: Conformance
- startindex=0 has exactly the same effect as startindex not being specified
- all WFS 2.0 responses will be sorted, at the cost of performance
- we are conformant with the default values specified in the WFS 2.0 spec

So, in a nutshell, should all WFS 2.0 responses be sorted?

In a nutshell, no, we need to grow a way to tell if paging was being
asked for or not.
Ideally detect and explicit startIndex=0, less ideally a way to at
least detect if
maxFeatures was explicitly provided (if we don't have that either,
it's definitely
not paging)

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On 16/05/12 14:07, Andrea Aime wrote:

In a nutshell, no, we need to grow a way to tell if paging was being
asked for or not.

Can't be done. The problem is you don't know if paging is being asked for until you get the request for the next page.

First request: "Hello, please give me 1000 features" (count=1000, no startindex)

(Client sees that numberMatched>1000)

Second request: "Hello, please give me the next 1000 features" (count=1000, startindex=1000)

How can GeoServer know if the client will ever make the second request? The client may just go away. But this is also a perfectly legitimate paging use-case, where the client asks for two pages.

If GeoServer gives the client 1000 unsorted features, it will not be able to respond to the second request and maintain consistency because unsorted responses are not stable.

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

I also found this stability requirement and a rather leading hint from p41 of the WFS 2.0.0 spec (OGC 09-025r1 and ISO/DIS 19142).

******

7.9.2.5.4.4
Sort processing
A web feature service that receives an ad hoc query expression without a sorting clause, shall generate a
response document in which features are presented in whatever order the server chooses. However, to
comply with this International Standard, servers shall ensure that whatever order is presented when an ad hoc
query, not containing a sort clause, is first executed is preserved across subsequent executions of the same
ad hoc query expression on the same set of features.
EXAMPLE
A server may choose to sort the features by their gml:id if the client has not specified a specific sorting
clause. Subsequent invocations of the same query expression on the same set of data should result in a response
document that presents the features in the same order.

******

Kind regards,
Ben.

On 16/05/12 12:16, Ben Caradoc-Davies wrote:

I have encountered a decision point while fixing a bug in WFS 2.0 paging:
https://jira.codehaus.org/browse/GEOS-5085

WFS 2.0 paging is implemented by specifying startindex and count (like
maxFeatures in 1.1.0) in a GetFeature request. Our implementation uses
the presence of startindex to detect whether paging is in use; to ensure
consistency across pages, results must be sorted when paging is in use.
However, this has one undesirable implication: our use of startindex is
at odds with the WFS 2.0 spec, which specifies that startindex defaults
to zero.

I see two options:

Option 1: Performance
- the presence of startindex triggers sorting for paging consistency
- the absence of startindex means that responses can be unsorted for
greater performance
- startindex=0 and the absence of startindex are treated differently
- clients that omit startindex for their first page of paged results
will get inconsistent pages (was are *assuming* that all paging clients
set startindex=0 for their first page, despite this being explicitly the
default in the spec)
- we will have a surprising nudge-nudge-wink-wink interpretation of the
WFS 2.0 spec that differs from the tabulated default value of startindex

Option 2: Conformance
- startindex=0 has exactly the same effect as startindex not being
specified
- all WFS 2.0 responses will be sorted, at the cost of performance
- we are conformant with the default values specified in the WFS 2.0 spec

So, in a nutshell, should all WFS 2.0 responses be sorted?

Kind regards,

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

On Wed, May 16, 2012 at 8:14 AM, Ben Caradoc-Davies
<Ben.Caradoc-Davies@anonymised.com> wrote:

On 16/05/12 14:07, Andrea Aime wrote:

In a nutshell, no, we need to grow a way to tell if paging was being
asked for or not.

Can't be done. The problem is you don't know if paging is being asked
for until you get the request for the next page.

Ben, it's a 7 lines mail, how could you miss the second sentence?

"less ideally a way to at least detect if maxFeatures was explicitly
provided (if we don't have that either,
it's definitely not paging)"

--> you can know if client asked for paging

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On 16/05/12 14:50, Andrea Aime wrote:

Ben, it's a 7 lines mail, how could you miss the second sentence?
"less ideally a way to at least detect if maxFeatures was explicitly
provided (if we don't have that either,
it's definitely not paging)"
--> you can know if client asked for paging

Andrea, I didn't miss what you wrote, it just didn't make sense to me. That isn't paging; this is usually just some poor client exploring a service and not wanting to get 1.3 million features. when it can only handle 200.

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

On Wed, May 16, 2012 at 8:56 AM, Ben Caradoc-Davies
<Ben.Caradoc-Davies@anonymised.com> wrote:

On 16/05/12 14:50, Andrea Aime wrote:

Ben, it's a 7 lines mail, how could you miss the second sentence?
"less ideally a way to at least detect if maxFeatures was explicitly
provided (if we don't have that either,
it's definitely not paging)"
--> you can know if client asked for paging

Andrea, I didn't miss what you wrote, it just didn't make sense to me. That
isn't paging; this is usually just some poor client exploring a service and
not wanting to get 1.3 million features. when it can only handle 200.

Meh, seemed a reasonable compromise to me (I believe it would have got us
both performace and conformance in the respective typical use cases, fully paged
access and non paged one, aka download mode) but never mind.

In such case I'd prefer option 1 to be the default behavior,
option 2 to be either not there or at least something that
has to be explicitly enabled.

Cheers
Andrea

--
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

Hey guys,

My take is more or less the same as Andrea, I think we should do the following:

  1. By default with no startIndex we don’t sort
  2. By default with startIndex=0 we do sort
  3. If cite compliance is turned on we adhere strictly to the spec

Ben I share your disdain for hiding behaviour behind the cite flag but it is consistent with how many things do work today.

Also, I think it should be possible to determine if the client actually asked for paging in the underlying emf request object since it allows for “isSet()” behaviour that gives you the ability to determine if a property was set or not.

$0.02

On Wed, May 16, 2012 at 1:35 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Wed, May 16, 2012 at 8:56 AM, Ben Caradoc-Davies
Ben.Caradoc-Davies@anonymised.com wrote:

On 16/05/12 14:50, Andrea Aime wrote:

Ben, it’s a 7 lines mail, how could you miss the second sentence?
“less ideally a way to at least detect if maxFeatures was explicitly
provided (if we don’t have that either,
it’s definitely not paging)”
→ you can know if client asked for paging

Andrea, I didn’t miss what you wrote, it just didn’t make sense to me. That
isn’t paging; this is usually just some poor client exploring a service and
not wanting to get 1.3 million features. when it can only handle 200.

Meh, seemed a reasonable compromise to me (I believe it would have got us
both performace and conformance in the respective typical use cases, fully paged
access and non paged one, aka download mode) but never mind.

In such case I’d prefer option 1 to be the default behavior,
option 2 to be either not there or at least something that
has to be explicitly enabled.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Thanks, Justin, I am happy to tick all these boxes. Updated patches in testing now.

We don't have WFS 2.0 on stable but we do have paging for WFS 1.1. I will include patches that backport a subset of these changes to stable. The main difference is that WFS 1.1 behaves as in (1) and (2) below but (3) is not applicable.

Stable will throw when anyone attempts to use startindex to sort a non-sortable data store. That is probably a good thing.

Kind regards,
Ben.

On 16/05/12 22:39, Justin Deoliveira wrote:

My take is more or less the same as Andrea, I think we should do the following:
1. By default with no startIndex we don't sort
2. By default with startIndex=0 we do sort
3. If cite compliance is turned on we adhere strictly to the spec
Ben I share your disdain for hiding behaviour behind the cite flag but it is consistent with how many things do work today.

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre

Correction: a nonsortable ContentFeatureSource. Other data stores should work fine.

On 17/05/12 14:26, Ben Caradoc-Davies wrote:

Stable will throw when anyone attempts to use startindex to sort a
non-sortable data store.

--
Ben Caradoc-Davies <Ben.Caradoc-Davies@anonymised.com>
Software Engineer
CSIRO Earth Science and Resource Engineering
Australian Resources Research Centre