[Geoserver-devel] excessive(?) OWSHandlerMapping calls to Catalog.getWorkspaceByName() and how to avoid it

Hi Justin, all,

By working on an alternate CatalogFacade implementation, found the
catalog being aggressively called for a workspace named "web".
Digging into it found that loading any wicket page results in a _lot_
of queries to the catalog. For example, loading the page
"/web/?wicket:bookmarkablePage=:org.geoserver.web.data.store.StorePage"
results in 288 calls to Catalog.getWorkspaceByName("web").
And if there's a workspace named "web", an extra 282 calls to
Catalog.getLayerByName("resources")

BTW, I know calling a workspace "web" is kind of hair pulled, but
possible. And it seems this affects any other "root" resource path
that's not /ows|wfs|wcs|wms.

The calls are made by OWSHandlerMapping in order to strip off local
workspace calls from the URL path.

It's not the end of the world, as the memory catalog facade needs just
to make a Map.containsKey or so for each call. But it does look like
unneeded overhead of a database backed catalog facade, as loading a
single web page leads to ~200 to ~400 database queries just to check
for a non existent workspace/layer.

So I wonder, and you would know better, if there's any easy way to avoid that.
I can think of a couple hacky ways (i.e. the killer one is the /web
url path, I don't think other "root" end points like /rest or /www
would get so many calls, so I could hack on the catalog facade
implementation to return null iif I know there's no such workspace,
etc). But I'm asking in case you can think of a non/less hacky one.

Only thing that crosses my mind is a catalog listener that keeps a set
of actually available workspace names so that OWSHandlerMapping
refrains from asking each time... But don't feel well at all with it.
By the other side, if OWSHandlerMapping had (or looks for) instead
what URL mappings are registered in the app context, that wouldn't
solve anything, since it is possible for a module to register the
/geosearch endpoint, but that doesn't prevent a workspace to be called
geosearch. So.... ideas?

TIA,
Gabriel
--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Yikes, indeed you are right, never considered that (obviously). But yeah, I think the approach of having the LocalWorkspaceCallback maintain a list of workspaces (keeping it up to date with a catalog listener) will work.

Another alternative would be to have the callback maintain a thread local cache of workspaces that it has already tried to look for, so as to only pay the price for the first lookup. Could be a bit simpler and not have to deal with synchronization, etc…

On Tue, Jan 10, 2012 at 5:24 PM, Gabriel Roldan <groldan@anonymised.com1501…> wrote:

Hi Justin, all,

By working on an alternate CatalogFacade implementation, found the
catalog being aggressively called for a workspace named “web”.
Digging into it found that loading any wicket page results in a lot
of queries to the catalog. For example, loading the page
“/web/?wicket:bookmarkablePage=:org.geoserver.web.data.store.StorePage”
results in 288 calls to Catalog.getWorkspaceByName(“web”).
And if there’s a workspace named “web”, an extra 282 calls to
Catalog.getLayerByName(“resources”)

BTW, I know calling a workspace “web” is kind of hair pulled, but
possible. And it seems this affects any other “root” resource path
that’s not /ows|wfs|wcs|wms.

The calls are made by OWSHandlerMapping in order to strip off local
workspace calls from the URL path.

It’s not the end of the world, as the memory catalog facade needs just
to make a Map.containsKey or so for each call. But it does look like
unneeded overhead of a database backed catalog facade, as loading a
single web page leads to ~200 to ~400 database queries just to check
for a non existent workspace/layer.

So I wonder, and you would know better, if there’s any easy way to avoid that.
I can think of a couple hacky ways (i.e. the killer one is the /web
url path, I don’t think other “root” end points like /rest or /www
would get so many calls, so I could hack on the catalog facade
implementation to return null iif I know there’s no such workspace,
etc). But I’m asking in case you can think of a non/less hacky one.

Only thing that crosses my mind is a catalog listener that keeps a set
of actually available workspace names so that OWSHandlerMapping
refrains from asking each time… But don’t feel well at all with it.
By the other side, if OWSHandlerMapping had (or looks for) instead
what URL mappings are registered in the app context, that wouldn’t
solve anything, since it is possible for a module to register the
/geosearch endpoint, but that doesn’t prevent a workspace to be called
geosearch. So… ideas?

TIA,
Gabriel

Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Wed, Jan 11, 2012 at 2:04 AM, Justin Deoliveira <jdeolive@anonymised.com.1501…> wrote:

Yikes, indeed you are right, never considered that (obviously). But yeah, I think the approach of having the LocalWorkspaceCallback maintain a list of workspaces (keeping it up to date with a catalog listener) will work.

Actually it’s the OWSHandlerMapping, the LocalWorkspaceCallback should be invoked only by the OWS dispatcher, so not
when working against the GUI I believe.

Another alternative would be to have the callback maintain a thread local cache of workspaces that it has already tried to look for, so as to only pay the price for the first lookup. Could be a bit simpler and not have to deal with synchronization, etc…

I’m worried this might end up defeating one of the purposes of having a database as a persistence layer, that is,
having other applications modify directly the database.
If we don’t do queries and rely on a cache we’ll basically ignore changes in the database.

I’m wondering if there is any way to identify what is actually going to handle the request (rest, wicket, ows
dispatcher?) and apply the workspace local management only on OWS ones?

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Wed, Jan 11, 2012 at 2:59 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Wed, Jan 11, 2012 at 2:04 AM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Yikes, indeed you are right, never considered that (obviously). But yeah, I think the approach of having the LocalWorkspaceCallback maintain a list of workspaces (keeping it up to date with a catalog listener) will work.

Actually it’s the OWSHandlerMapping, the LocalWorkspaceCallback should be invoked only by the OWS dispatcher, so not
when working against the GUI I believe.

Ahh right, of course.

Another alternative would be to have the callback maintain a thread local cache of workspaces that it has already tried to look for, so as to only pay the price for the first lookup. Could be a bit simpler and not have to deal with synchronization, etc…

I’m worried this might end up defeating one of the purposes of having a database as a persistence layer, that is,
having other applications modify directly the database.
If we don’t do queries and rely on a cache we’ll basically ignore changes in the database.

By thread local I meant the cache would only live for the life of a single request/thread. But yeah thinking more about that it doesn’t make since if i understand we are talking about multiple requests in this case.

I’m wondering if there is any way to identify what is actually going to handle the request (rest, wicket, ows
dispatcher?) and apply the workspace local management only on OWS ones?

Can’t think of a great one off hand, this is sort of what the handler mappings themselves specify… buts its hard to deterministically determine that from their contents since its pattern passed. Although most of the patterns are pretty simple. All in all i think Gabriels suggestion of a catalog listener maintaing a list of workspaces names is probably the most promising.

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.