[Geoserver-devel] feature paging

Hi all,

The idea of paging has recently popped up on the list. Also for the geo search stuff it is necessary to be able to page through features when requesting kml. So I thought I would start a thread on paging :).

The api for a client paging through seems pretty straight forward. We already have "maxFeatures" parameter. Adding "startIndex" would pretty much do it. How the server processes it is the interesting part.

I guess the whole idea of paging relies on the ability to assume ordering in the underlying dataset or index you are you using to access features. Jody tried to pull this off before with FeatureList... but not sure that was such a successful endeavor.

It seems a less invasive approach might be to pass some query hints to the underlying datastore. I am interested in hearing peoples thoughts on this one.

As a little experiment for KML, i create a feature source wrapper, called PagingFeatureSource. Hooking up the startIndex and maxFeatures parameters was straight forward, and it worked... for KML. It does not work for rendered output since the render does not actually use the featureSource provided by map context, it grabs the query and the datastore and grabs its own. Or at least the shapefile renderer does, not sure if the steaming renderer behaves the same way. Which is another plus towards query hints.

Anyways.. rant away :).

-Justin

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

Justin Deoliveira wrote:

I guess the whole idea of paging relies on the ability to assume ordering in the underlying dataset or index you are you using to access features. Jody tried to pull this off before with FeatureList... but not sure that was such a successful endeavor.
  

:smiley: thanks for the encouragement

The idea is based on the Filter 1.1 specification sortBy, and the catalog specification startIndex

It seems a less invasive approach might be to pass some query hints to the underlying datastore. I am interested in hearing peoples thoughts on this one.
  

Please don't put something like this as a Hint (I am getting sick of magic api fun) - I still
like the idea of startIndex and maxFeatures; as long as you make them a first class
part of Query we should be good?

Any word on them making this approach part of WFS 1.next? My understanding at the time
was that it was an oversight not to include startIndex in WFS 1.1.

As a little experiment for KML, i create a feature source wrapper, called PagingFeatureSource. Hooking up the startIndex and maxFeatures parameters was straight forward, and it worked... for KML. It does not work for rendered output since the render does not actually use the featureSource provided by map context, it grabs the query and the datastore and grabs its own. Or at least the shapefile renderer does, not sure if the steaming renderer behaves the same way. Which is another plus towards query hints.
  

Can accomplish the same thing if the renderer keeps tracks of how many features it has drawn.. decreasing the
Updating the Query limit each time it makes a new request. You would need to track that for the Hints approach
regardless...

Jody

Please don't put something like this as a Hint (I am getting sick of magic api fun) - I still
like the idea of startIndex and maxFeatures; as long as you make them a first class
part of Query we should be good?

I think it does make more sense as a hint. Because it gives us a way to check if the datastore actually supports it or not. Having it in api only makes sense if all datastores can do it. And even so, given the quality,testing and ease of maintainance of our datastore implementations this is unlikely to happen. Look at the mess that occured with reprojection. Some do it, some don't, no way to check means geoserver has to do its own custom stuff.

Can accomplish the same thing if the renderer keeps tracks of how many features it has drawn.. decreasing the
Updating the Query limit each time it makes a new request. You would need to track that for the Hints approach
regardless...

not sure i understand... having it built into the datastore makes more sense then having the renderer manage it because then it would apply across the board. Also with the hints the datastore (like postgis) could actually build them into the sql query, making access more efficient.

Jody

!DSPAM:4007,47f5602b11021439371379!

--
Justin Deoliveira
The Open Planning Project
jdeolive@anonymised.com

Justin Deoliveira wrote:

Please don't put something like this as a Hint (I am getting sick of magic api fun) - I still
like the idea of startIndex and maxFeatures; as long as you make them a first class
part of Query we should be good?

I think it does make more sense as a hint. Because it gives us a way to check if the datastore actually supports it or not. Having it in api only makes sense if all datastores can do it. And even so, given the quality,testing and ease of maintainance of our datastore implementations this is unlikely to happen. Look at the mess that occured with reprojection. Some do it, some don't, no way to check means geoserver has to do its own custom stuff.

Thinking.

You are of course correct that we cannot roll out new ideas and expect them to be implemented (at all) unless we do it. The ability of DataStore to advertise which hints it supports at least lets us check what things are supported. We should have a Capabilities object where a DataStore can describe what it can do (we need a way to advertise the FilterCapabilities for example of a remote WFS).

If we rephrase the old "Query is not well supported" problem to a "what part of Query do you support" question we make better progress.
Jody

Just to chip in with a little thing.

Ok startIndex and maxFeatures are enough to do paging. The catalog has it
quite well defined so it makes sense to just use the same naming/semantics.

In the catalog spec the paged responses also return the last index and the
total result count, so you know where your're positioned and can calulate how
much remaining content there is.

You're correct paging implies ordering too. If not explicitly asked with a
soryby, there might be a natural order (fid?) you have to rely on. That's one
more thing paging capable datastores need to take into account, since
databases do not guarantee to return the results in the same order over
successive requests.

And there's the usual concern of the result count having changed between page
and page request...

Finally, either hint or query, we'll need to advertise paging support in a
per feature type basis. hot topic, I know there's more people looking forward
for that kind of improvements in the wfs spec.

my 2c..

Gabriel

On Friday 04 April 2008 12:34:17 am Justin Deoliveira wrote:

Hi all,

The idea of paging has recently popped up on the list. Also for the geo
search stuff it is necessary to be able to page through features when
requesting kml. So I thought I would start a thread on paging :).

The api for a client paging through seems pretty straight forward. We
already have "maxFeatures" parameter. Adding "startIndex" would pretty
much do it. How the server processes it is the interesting part.

I guess the whole idea of paging relies on the ability to assume
ordering in the underlying dataset or index you are you using to access
features. Jody tried to pull this off before with FeatureList... but not
sure that was such a successful endeavor.

It seems a less invasive approach might be to pass some query hints to
the underlying datastore. I am interested in hearing peoples thoughts on
this one.

As a little experiment for KML, i create a feature source wrapper,
called PagingFeatureSource. Hooking up the startIndex and maxFeatures
parameters was straight forward, and it worked... for KML. It does not
work for rendered output since the render does not actually use the
featureSource provided by map context, it grabs the query and the
datastore and grabs its own. Or at least the shapefile renderer does,
not sure if the steaming renderer behaves the same way. Which is another
plus towards query hints.

Anyways.. rant away :).

-Justin

On Thu, Apr 3, 2008 at 9:24 PM, Gabriel Roldán <groldan@anonymised.com> wrote:

You're correct paging implies ordering too. If not explicitly asked with a
soryby, there might be a natural order (fid?) you have to rely on. That's one
more thing paging capable datastores need to take into account, since
databases do not guarantee to return the results in the same order over
successive requests.

Indeed, LIMIT/OFFSET in PgSQL would seem to offer an easy out, but if
I dirty a record between your requests, it'll fall to the bottom of
your second result set. If you always ORDER BY the primary key, the
odds of getting consistent results go up, but I can still delete
records right out from under you.

WWGD... What Would Google Do... what guarantees does the client expect
from paging? That no record show up on two pages? What happens in
GMail if you (a) load up your page (b) a new mail arrives but the page
has not had a chance to update and (c) you page to the "next 20
entries". Is the first entry in that page the 21st based on what you
were looking at, or based on what you actually have in your inbox?

Whee!

I think the key to cracking this is to go back to the client side and
ask "what is the paging contract".

I had though of expressing the start and end FeatureIds before I ran into the catalog specification. There may still be value in that approach - I came up with it using the following assumptions:
- when handling a query in this way results are returned sorted by featureId
- some kind of random access to "scan" (at least forward) to a featureId (this was in the early days when FeatureReader had a *skip* method)

Jody

On Thu, Apr 3, 2008 at 9:24 PM, Gabriel Roldán <groldan@anonymised.com> wrote:

You're correct paging implies ordering too. If not explicitly asked with a
soryby, there might be a natural order (fid?) you have to rely on. That's one
more thing paging capable datastores need to take into account, since
databases do not guarantee to return the results in the same order over
successive requests.
    
Indeed, LIMIT/OFFSET in PgSQL would seem to offer an easy out, but if
I dirty a record between your requests, it'll fall to the bottom of
your second result set. If you always ORDER BY the primary key, the
odds of getting consistent results go up, but I can still delete
records right out from under you.

WWGD... What Would Google Do... what guarantees does the client expect
from paging? That no record show up on two pages? What happens in
GMail if you (a) load up your page (b) a new mail arrives but the page
has not had a chance to update and (c) you page to the "next 20
entries". Is the first entry in that page the 21st based on what you
were looking at, or based on what you actually have in your inbox?

Whee!

I think the key to cracking this is to go back to the client side and
ask "what is the paging contract".

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel