[Geoserver-devel] update to services for dbconfig

Hi all,

I said before that to support dbconfig the goal was to update no client code. Well I lied because there is a situation in which I don’t really see a way around. Thankfully the changes are very mechanical.

The change in question is for singleton objects (like output formats, and DefaultWeb*Service objects) that hold onto the same instance of a ServiceInfo for their life time. An example:

public class DefaultWebFeatureService {

WFSInfo wfs;

public DefaultWebFeatureService(GeoServer gs) {
this.wfs = gs.getServiceInfo(gs):
this.catalog = gs.getCatalog();
}

}

The problem is that once the wfs configuration changes externally the WFSInfo object becomes stale. Now before this was not an issue because the WFSInfo was actually a proxy backed by the real object that lived in memory for the life of the application. But with a db backed config that is not the case. So the WFSInfo object has be look up every time on demand. So the class now becomes:

public class DefaultWebFeatureService {

GeoServer gs;

public DefaultWebFeatureService(GeoServer gs) {
this.gs = gs;
this.catalog = gs.getCatalog();
}

WFSInfo getServiceInfo() {
return gs.getService(WFSInfo.class);
}

}

This pattern is littered throughout the code so unfortunately fixing it requires a lot of updates. However thankfully almost all of the classes take a GeoServer instance in there constructor so no spring config has to change and the change is quite mechanical.

And actually it is only WFS and WCS that suffer from this. When gabriel refactored WMS he did it in a way that always looks up the config object. Go Gabriel!!

Here is the patch that updates the entire code base:

http://jira.codehaus.org/secure/attachment/51461/GEOS-4152.patch

So, any potential downsides. Well one is possibly performance. Since the look up occurs every time the object is needed it comes with a price. However I think this will be negligible because (a) the service objects will probably most of the time be cached and (b) the table backing the services is very small and only has as many rows as there are services, so currently 3 (4 when we hook up WPS). However once the db is not local and over a network these queries are more expensive by an order of magnitude. So in
performance critical areas a workaround will be for the singleton to:

  1. hold onto a reference to the object as before
  2. register a listener that clears out and reloads the object only when it changes

I think that should more or less alleviate any issue with cost of the lookup.

So that is the issue. Any feedback or comments welcome. With this change and the catlaog/config dao refactor committed the dbconfig module can land smoothly and start being used.

-Justin


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Sat, Oct 2, 2010 at 12:58 AM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Here is the patch that updates the entire code base:
http://jira.codehaus.org/secure/attachment/51461/GEOS-4152.patch

Saw this mail after reviewing the patch, I added comments in the jira.

So, any potential downsides. Well one is possibly performance. Since the
look up occurs every time the object is needed it comes with a price.
However I think this will be negligible because (a) the service objects will
probably most of the time be cached and (b) the table backing the services
is very small and only has as many rows as there are services, so currently
3 (4 when we hook up WPS). However once the db is not local and over a
network these queries are more expensive by an order of magnitude. So in
performance critical areas a workaround will be for the singleton to:
1. hold onto a reference to the object as before
2. register a listener that clears out and reloads the object only when it
changes
I think that should more or less alleviate any issue with cost of the
lookup.
So that is the issue. Any feedback or comments welcome. With this change and
the catlaog/config dao refactor committed the dbconfig module can land
smoothly and start being used.

I believe we don't want the config and catalog to query like
crazy a remote dbms, each round trip is expensive if compared to the time
we take to render a GetMap request in the typical benchmarking setup
(which is something like 50ms total).

Ideally, both should use some 2nd level caching setup, possibly
cluster aware, configured
so that the actual hits to the database are reduced to a minimum.
Some references:
* http://ehcache.org/documentation/hibernate.html
* http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-cache
(not how the newest ehcache should also be cluster safe, see the faq
in the ehcache docs).

Well, just some ideas, I did not look in the details of how to add it
or to make this admin configurable
(and so on), but it would seem a promising road.

If people want to hit the db directly and change the config there they
would also have to notify
the 2nd level cache about the changes somehow.
Which brings me to a question: the on disk xml configuration is not
the recommended way to
alter the config. What will be our stance towards the database one?
If we just say to go and use restconfig the clustered cache should
handle dropping dirty items
automatically, however I guess the config should be extended to allow
editing services as well.

Cheers
Andrea

-----------------------------------------------------
Ing. Andrea Aime
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

Thanks for the feedback Andrea. Good stuff.

On Sat, Oct 2, 2010 at 2:46 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Sat, Oct 2, 2010 at 12:58 AM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Here is the patch that updates the entire code base:
http://jira.codehaus.org/secure/attachment/51461/GEOS-4152.patch

Saw this mail after reviewing the patch, I added comments in the jira.

So, any potential downsides. Well one is possibly performance. Since the
look up occurs every time the object is needed it comes with a price.
However I think this will be negligible because (a) the service objects will
probably most of the time be cached and (b) the table backing the services
is very small and only has as many rows as there are services, so currently
3 (4 when we hook up WPS). However once the db is not local and over a
network these queries are more expensive by an order of magnitude. So in
performance critical areas a workaround will be for the singleton to:

  1. hold onto a reference to the object as before
  2. register a listener that clears out and reloads the object only when it
    changes
    I think that should more or less alleviate any issue with cost of the
    lookup.
    So that is the issue. Any feedback or comments welcome. With this change and
    the catlaog/config dao refactor committed the dbconfig module can land
    smoothly and start being used.

I believe we don’t want the config and catalog to query like
crazy a remote dbms, each round trip is expensive if compared to the time
we take to render a GetMap request in the typical benchmarking setup
(which is something like 50ms total).

Yeah… I agree we will want to minimize this but regardless a cache will be important as we can’t avoid lookups unless we want to start introducing an in memory cache at the server level. Hopefully a properly configured hibernate second level cache will be good enough for us.

Ideally, both should use some 2nd level caching setup, possibly
cluster aware, configured
so that the actual hits to the database are reduced to a minimum.
Some references:

Well, just some ideas, I did not look in the details of how to add it
or to make this admin configurable
(and so on), but it would seem a promising road.

Cool. Currently dbconfig is set up with h2 as the second level caching, but i have not done much in terms of tweaking the cache configuration. Something definitely worth pursuing in the short term future once we have dbconfig working as well as the in memory config.

If people want to hit the db directly and change the config there they
would also have to notify
the 2nd level cache about the changes somehow.
Which brings me to a question: the on disk xml configuration is not
the recommended way to
alter the config. What will be our stance towards the database one?
If we just say to go and use restconfig the clustered cache should
handle dropping dirty items
automatically, however I guess the config should be extended to allow
editing services as well.

While I think having a db config makes it easier for people to hack configuration directly I still think we should discourage for a couple of reasons. The first being that it is just too easy to shoot ones self in the foot. The second being the same argument we had for the file based persistence. They are subject to change. And if we promote people developing tools against those tables we have to worry about backward compatibility which makes it harder for the devs to make changes to configuration.

Cheers
Andrea


Ing. Andrea Aime
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.