[Geoserver-devel] Direct GWC WMS integration breaks control flow (and possibly other callbacks)

Hi,
this morning I was investigating a GeoServer 2.1.0 locking up and stopping
responding to requests after a few calls and found out it was due to
an interaction between control-flow and the direct WMS integration of GWC.

When a WMS request with tiled=true matches a tile set GWC will intercept
the request and handle it internally by building a separate request that is then
send to the dispatcher.
This request in the dispatcher, and thus all dispatcher callbacks, being called
twice on that request, first for the original request, and then for the one that
GWC made up. See attached screenshot of a Eclipse debug session.

When this hits control flow it results in lost concurrency control tokens.
I'm going to harden control-flow to resist this mishbehavior, but similar
issues will happen for any callback that assumes, rightfully, that there is
a 1-1 relationship between a request and the callbacks.

For example the monitoring subsystem will likely register spurious, non
existing requests in this case.

Opinions on alternative fixes? One way could be to have GWC
call direclty the DefaultWebMapSErvice object, however there is a drawback
in doing that, which is that all GWC protocols would then be out of
control flow reach, so one could hit GWC very hard to bring the server
on its knees (just find an unseeded area/zoom level and throw as many
requests as possible at it).

Alternatively we could just update the DispatcherCallback documentation
and warn implementors that methods can be called multiple time per
physical request due to plugins making "inside requests" while serving
the main one.

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

(attachments)

stack.png

Hi Andrea,
yes, the reason for calling the dispatcher instead of using
WebMapService directly is that gwc requests don't run our of
control-flow control.
Unfortunately this has always been the case and hence I thought it was
alright and never really dig deep into how well they were interacting.

I see you provided a fix already that looks great, cool. There's perhaps
one more thing that we could do to cover all the cases.
The way ControlFlowCallBack is set up right now, still assumes that that
any nested dispatcher request will come from the same thread. That holds
true for the gwc "cache miss" case. I wonder though what's up with
possible dispatcher calls issued by a separate thread?
I don't think GWC will do that right now, because the only case when
it's going to spawn new threads is upon a seed request, and then each of
them calling on the dispatcher will get it's own set of FlowControllers.
But in the eventual case that a thread that already passed the door and
then spawns new threads those one might be out of control? (like in if
gwc with integrated WMS spawned new threads to fetch individual
metatiles for a more agressive caching).
As said, that's not what happens right now, but I guess there'd be a
really easy way of preventing that from the start, which would be to use
an InheritableThreadLocal[1] instead of a simple ThreadLocal to hold
REQUEST_CONTROLLERS?

Thanks for the quick fix btw.
Cheers,
Gabriel.

[1]
<http://download.oracle.com/javase/1,5.0/docs/api/java/lang/InheritableThreadLocal.html&gt;

On Fri, 2011-05-13 at 12:11 +0200, Andrea Aime wrote:

Hi,
this morning I was investigating a GeoServer 2.1.0 locking up and stopping
responding to requests after a few calls and found out it was due to
an interaction between control-flow and the direct WMS integration of GWC.

When a WMS request with tiled=true matches a tile set GWC will intercept
the request and handle it internally by building a separate request that is then
send to the dispatcher.
This request in the dispatcher, and thus all dispatcher callbacks, being called
twice on that request, first for the original request, and then for the one that
GWC made up. See attached screenshot of a Eclipse debug session.

When this hits control flow it results in lost concurrency control tokens.
I'm going to harden control-flow to resist this mishbehavior, but similar
issues will happen for any callback that assumes, rightfully, that there is
a 1-1 relationship between a request and the callbacks.

For example the monitoring subsystem will likely register spurious, non
existing requests in this case.

Opinions on alternative fixes? One way could be to have GWC
call direclty the DefaultWebMapSErvice object, however there is a drawback
in doing that, which is that all GWC protocols would then be out of
control flow reach, so one could hit GWC very hard to bring the server
on its knees (just find an unseeded area/zoom level and throw as many
requests as possible at it).

Alternatively we could just update the DispatcherCallback documentation
and warn implementors that methods can be called multiple time per
physical request due to plugins making "inside requests" while serving
the main one.

Cheers
Andrea

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________ Geoserver-devel mailing list Geoserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Gabriel Roldan
groldan@anonymised.com
Expert service straight from the developers

On Fri, May 13, 2011 at 4:42 PM, Gabriel Roldán <groldan@anonymised.com> wrote:

Hi Andrea,
yes, the reason for calling the dispatcher instead of using
WebMapService directly is that gwc requests don't run our of
control-flow control.
Unfortunately this has always been the case and hence I thought it was
alright and never really dig deep into how well they were interacting.

Yep, the difference is that GWC using its own protocols enters in
the dispatcher only when computing the missing tiles (so just once)
whilst the direct WMS integration makes it enter the dispatcher twice,
nesting two requests one inside the other.

I see you provided a fix already that looks great, cool. There's perhaps
one more thing that we could do to cover all the cases.
The way ControlFlowCallBack is set up right now, still assumes that that
any nested dispatcher request will come from the same thread.

More or less. The thing is, the bug was happening exacly because the
two requests were coming from the same thread, if they would have
come from two different threads they would not have been able to
step on each other toes and cause the bug.

That holds
true for the gwc "cache miss" case. I wonder though what's up with
possible dispatcher calls issued by a separate thread?
I don't think GWC will do that right now, because the only case when
it's going to spawn new threads is upon a seed request, and then each of
them calling on the dispatcher will get it's own set of FlowControllers.
But in the eventual case that a thread that already passed the door and
then spawns new threads those one might be out of control? (like in if
gwc with integrated WMS spawned new threads to fetch individual
metatiles for a more agressive caching).
As said, that's not what happens right now, but I guess there'd be a
really easy way of preventing that from the start, which would be to use
an InheritableThreadLocal[1] instead of a simple ThreadLocal to hold
REQUEST_CONTROLLERS?

Not so sure. This will be a good idea only if you dynamically create new
child threads on demand, but if you rely on a stable thread pool those
threads are
nobody else's child (they are child of the thread that created the pool in fact)
so the inheritable thread local stops working

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------