[Geoserver-devel] some improvements to GeoServerTablePanel

Hi all,

Recently I have been working on a couple of things that require the UI to be able to support paging in a more efficient way than it does now. The first is the hibernate catalog, and supporting a layer view that has to show 1000’s (or more) of layers. The second is monitoring where one can page through requests which can easily be in the millions.

I am using GeoServerTablePanel/GeoServerDataProvider as the base here, since it is very convenient to work with. Most implementations simply implement getItems() and call it a day. Which is great for working in memory but obviously does not scale up to large collections.

However, the api is nicely designed in that it allows you to override some key functions in cases such as this where you don’t want to drag all the data into memory. Those methods are:

size() : return the size of the current filtered collection
fullSize(): return the size of the unfiltered/entire collection
iterator(i,j): provide an iterator over a subset of the data

Very nice. It works great and nothing gets dragged into memory once i implement those methods properly. However, I still believe there is some room for optimization. I put in some trace logging to gather up how often each of the methods is called. And here is what I got:

size()

size()

size()

fullSize()

size()

size()

fullSize()

size()

iterator(0, 20)

As you can see size() is called often. Even implementing it with a count(*) query those calls add up, especially when the db is remote. So I tried to come up with a patch to try and reduce the calls to compute the size. The patch is attached. It is not all that pretty but a relatively simple in idea. The idea is twofold:

  1. Try to reduce the number of calls to size by holding onto previously computed size

  2. Take advantage of the fact that size() == fullSize() when there is no filter set

With the patch applied i am able to get the calls down to:

fullSize()

size()

size()

iterator(0, 20)

Which is a step up for sure. Further optimization could probably be done since there are still multiple calls to size… but the only way i could see to do so was by using some sort of thread local to store previously computed size but it seemed a bit much. And this optimization seems to perform well enough.

So… thoughts? I tried all the various lists that we have in the ui with the patch attached and they all work as they did before, so nothing appears to break. I did try to write some test cases but it proved difficult in this case.

-justin


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

(attachments)

GeoServerTablePanel.java.patch (5.38 KB)

On Fri, Jan 28, 2011 at 5:56 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Hi all,
Recently I have been working on a couple of things that require the UI to be
able to support paging in a more efficient way than it does now. The first
is the hibernate catalog, and supporting a layer view that has to show
1000's (or more) of layers. The second is monitoring where one can page
through requests which can easily be in the millions.
I am using GeoServerTablePanel/GeoServerDataProvider as the base here, since
it is very convenient to work with. Most implementations simply implement
getItems() and call it a day. Which is great for working in memory but
obviously does not scale up to large collections.
However, the api is nicely designed in that it allows you to override some
key functions in cases such as this where you don't want to drag all the
data into memory. Those methods are:
size() : return the size of the current filtered collection
fullSize(): return the size of the unfiltered/entire collection
iterator(i,j): provide an iterator over a subset of the data
Very nice. It works great and nothing gets dragged into memory once i
implement those methods properly. However, I still believe there is some
room for optimization. I put in some trace logging to gather up how often
each of the methods is called. And here is what I got:

size()

size()

size()

fullSize()

size()

size()

fullSize()

size()

iterator(0, 20)

As you can see size() is called often. Even implementing it with a count(*)
query those calls add up, especially when the db is remote. So I tried to
come up with a patch to try and reduce the calls to compute the size. The
patch is attached. It is not all that pretty but a relatively simple in
idea. The idea is twofold:

1. Try to reduce the number of calls to size by holding onto previously
computed size

2. Take advantage of the fact that size() == fullSize() when there is no
filter set

With the patch applied i am able to get the calls down to:

fullSize()

size()

size()

iterator(0, 20)

Which is a step up for sure. Further optimization could probably be done
since there are still multiple calls to size... but the only way i could see
to do so was by using some sort of thread local to store previously computed
size but it seemed a bit much. And this optimization seems to perform well
enough.

Actually I think Wicket stores somewhere the page version, if that is accessible
anywhere we could use that to drive the caching (we reuse the cache only if the
current page version is the same as the previous one)

So.. thoughts? I tried all the various lists that we have in the ui with the
patch attached and they all work as they did before, so nothing appears to
break. I did try to write some test cases but it proved difficult in this
case.

Yeah, wrinting tests for ajax behaviors is a mess at best, never managed to
make it work consistently.

I did not try it but he patch looks good, +1 on merging it on trunk
(and on 2.1.x
after the release).
I'm also fine at merging it on 2.1.x before the release but I guess I should
give it a deeper look (actually try it, see what goes on), which I guess
I can do tomorrow?

Cheers
Andrea

--
Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

On Fri, Jan 28, 2011 at 11:27 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Fri, Jan 28, 2011 at 5:56 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Hi all,
Recently I have been working on a couple of things that require the UI to be
able to support paging in a more efficient way than it does now. The first
is the hibernate catalog, and supporting a layer view that has to show
1000’s (or more) of layers. The second is monitoring where one can page
through requests which can easily be in the millions.
I am using GeoServerTablePanel/GeoServerDataProvider as the base here, since
it is very convenient to work with. Most implementations simply implement
getItems() and call it a day. Which is great for working in memory but
obviously does not scale up to large collections.
However, the api is nicely designed in that it allows you to override some
key functions in cases such as this where you don’t want to drag all the
data into memory. Those methods are:
size() : return the size of the current filtered collection
fullSize(): return the size of the unfiltered/entire collection
iterator(i,j): provide an iterator over a subset of the data
Very nice. It works great and nothing gets dragged into memory once i
implement those methods properly. However, I still believe there is some
room for optimization. I put in some trace logging to gather up how often
each of the methods is called. And here is what I got:

size()

size()

size()

fullSize()

size()

size()

fullSize()

size()

iterator(0, 20)

As you can see size() is called often. Even implementing it with a count(*)
query those calls add up, especially when the db is remote. So I tried to
come up with a patch to try and reduce the calls to compute the size. The
patch is attached. It is not all that pretty but a relatively simple in
idea. The idea is twofold:

  1. Try to reduce the number of calls to size by holding onto previously
    computed size

  2. Take advantage of the fact that size() == fullSize() when there is no
    filter set

With the patch applied i am able to get the calls down to:

fullSize()

size()

size()

iterator(0, 20)

Which is a step up for sure. Further optimization could probably be done
since there are still multiple calls to size… but the only way i could see
to do so was by using some sort of thread local to store previously computed
size but it seemed a bit much. And this optimization seems to perform well
enough.

Actually I think Wicket stores somewhere the page version, if that is accessible
anywhere we could use that to drive the caching (we reuse the cache only if the
current page version is the same as the previous one)

Interesting. Yeah that would be a nice way to handle that. I will look into that.

So… thoughts? I tried all the various lists that we have in the ui with the
patch attached and they all work as they did before, so nothing appears to
break. I did try to write some test cases but it proved difficult in this
case.

Yeah, wrinting tests for ajax behaviors is a mess at best, never managed to
make it work consistently.

I did not try it but he patch looks good, +1 on merging it on trunk
(and on 2.1.x
after the release).
I’m also fine at merging it on 2.1.x before the release but I guess I should
give it a deeper look (actually try it, see what goes on), which I guess
I can do tomorrow?

No, trunk is fine for now. I don’t want to commit to 2.1.x this close to a release either since it is a core change. I do at some point want to get it into 2.1.x though. But for now let’s let it sit on trunk for a while and get some testing from the dev team.

Cheers
Andrea


Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.