[Geoserver-devel] New community module: control flow

Hi,
lately I've been working on the idea of controlling how many
requests of a given type can be performed in parallel.

This is driven by a number of concerns.
First off, I don't want a GeoServer to starting throwing OOM
like mad because it's trying to serve too many GetMap in parallel.
The WMS limits already allow an admin to control how much memory
a single WMS request is going to use, but while that prevents a single
request from eating all the memory, it still does not prevent
exhaustion from too many requests.

A second case I want to handle is something I've discovered while
playing with OpenLayers tiled demos. Just open the preview, switch
to tiled mode, make the map size quite large, and start zooming
very fast up and down, using the mouse scroller or just clicking
on the zoom bar very fast.
What happens is that Firefox does right away 6 requests in parallel
against GeoServer, then you change zoom, it drops the old requests
and makes another 6, and so on every time you change the current
zoom. The fact that firefox drops the request is not getting notified
on the server side until we try to write out to the output, which
might happen after quite some time.
With some instrumentation in the Dispatcher I've observed 50+
requests rendering in parallel generated by a single client.
That is of course unacceptable, a single user that way can really
bring a small server to its knees, we want to make sure a single client
cannot make more than X requests in parallel and have the others
refused or queued*.

Yet another reason to control the incoming requests is pure
performance. It has been noticed during the FOSS4G benchmarks that
limiting the number of parallel GetMap requests that a server
is actually working on to something like 2*NumCpu helps throughput,
in some cases significantly so.
We can do that by making the web container a thread pool with a
certain max serving threads, but that has some significant
disadvantages:
- it affects all applications running in the container
- it affects the GUI. Try to limit the amount of parallel requests
   allowed to 4 and Firefox will be none too happy about it
   (I've experienced issues loading the GUI pages).
- some requests can scale up much more because they are very light
   so they should not be limited to 2*NumCpu threads. Think
   capabilities, GetFeatureInfo or just GetFeature, which usually
   is streaming and thus bandwidth limited as opposed to cpu limited
   (if you're serving towards the internet).

Long story short, I've created a pluggable module, leveraring DispatcherCallback, that allows to control the flow of incoming
requests based on a single property file.
A single configuration file like this one:

# request timeout in seconds
timeout=10
# no more than 100 parallel requests total
ows.global=100
# no more than 16 getmap in parallel, total
ows.wms.getmap=16
# don't give the single user more than 6 requests total in parallel
# (this is what a browser will do by default)
user=6

can be used to make all of the problems cited above melt like snow
in the sun. It will make sure GetMap requests won't overwhelm the
server, that a single user cannot monopolize the server, and that
requests hanging in queue waiting to be executed for more than
10 seconds just get dropped.
It will still allow plenty of GetFeatureInfo to be executed in
parallel and won't affect GUI related threads at all.

The design is based on blocking queues and tokens, benchmarks
show that it does not significantly affect performance.

Soo... ok to commit? I am PSC and I could give myself the +1,
but that would not be too nice :wink:

I'll put togheter a page describing the design and usage of the
module after committing it.

Cheers
Andrea

*: it would also be very nice to get notified that the client
    dropped the connection, but I've so far found no easy way
    to do that. But I'm still trying to work it out, in my spare time,
    leveraging the comet support that some web containers have added
    lately: http://n2.nabble.com/Checking-when-the-client-dropped-the-connection-td4198235.html#a4198235

--
Andrea Aime
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Wed, Jan 27, 2010 at 9:36 PM, Andrea Aime <aaime@anonymised.com> wrote:

What happens is that Firefox does right away 6 requests in parallel
against GeoServer, then you change zoom, it drops the old requests
and makes another 6, and so on every time you change the current
zoom. The fact that firefox drops the request is not getting notified
on the server side until we try to write out to the output, which
might happen after quite some time.

Could we "start" writing somehow - such as filling in the headers or
something? even if we know what kind of file format is being returned
we may be able to start writing out the initial first couple of
bytes...

Long story short, I've created a pluggable module, leveraring
DispatcherCallback, that allows to control the flow of incoming
requests based on a single property file.
A single configuration file like this one:

# request timeout in seconds
timeout=10
# no more than 100 parallel requests total
ows.global=100
# no more than 16 getmap in parallel, total
ows.wms.getmap=16
# don't give the single user more than 6 requests total in parallel
# (this is what a browser will do by default)
user=6

can be used to make all of the problems cited above melt like snow
in the sun. It will make sure GetMap requests won't overwhelm the
server, that a single user cannot monopolize the server, and that
requests hanging in queue waiting to be executed for more than
10 seconds just get dropped.
It will still allow plenty of GetFeatureInfo to be executed in
parallel and won't affect GUI related threads at all.

The design is based on blocking queues and tokens, benchmarks
show that it does not significantly affect performance.

Soo... ok to commit? I am PSC and I could give myself the +1,
but that would not be too nice :wink:

+1 I think you are okay to commit the community module (what an
amazing contribution to accomplish with a community module).

When I think control flow I actually expect something more like
process management then simply putting "per user caps" on the services
published. But you are writing it so you get to name it....

I'll put togheter a page describing the design and usage of the
module after committing it.

Cheers,
Jody

Jody Garnett wrote:

Andrea Aime <aaime@...1501...> wrote:

What happens is that Firefox does right away 6 requests in parallel
against GeoServer, then you change zoom, it drops the old requests and
makes another 6, and so on every time you change the current zoom. The
fact that firefox drops the request is not getting notified on the
server side until we try to write out to the output, which might happen
after quite some time.

Could we "start" writing somehow - such as filling in the headers or
something? even if we know what kind of file format is being returned we
may be able to start writing out the initial first couple of bytes...

Long story short, I've created a pluggable module, leveraring
DispatcherCallback, that allows to control the flow of incoming
requests based on a single property file.
A single configuration file like this one:

# request timeout in seconds timeout=10 # no more than 100 parallel
requests total ows.global=100 # no more than 16 getmap in parallel,
total ows.wms.getmap=16 # don't give the single user more than 6
requests total in parallel # (this is what a browser will do by
default) user=6

can be used to make all of the problems cited above melt like snow in
the sun. It will make sure GetMap requests won't overwhelm the server,
that a single user cannot monopolize the server, and that requests
hanging in queue waiting to be executed for more than 10 seconds just
get dropped. It will still allow plenty of GetFeatureInfo to be
executed in parallel and won't affect GUI related threads at all.

The design is based on blocking queues and tokens, benchmarks show
that it does not significantly affect performance.

Soo... ok to commit? I am PSC and I could give myself the +1, but that
would not be too nice :wink:

+1 I think you are okay to commit the community module (what an
amazing contribution to accomplish with a community module).

When I think control flow I actually expect something more like
process management then simply putting "per user caps" on the services
published. But you are writing it so you get to name it....

I'll put togheter a page describing the design and usage of the module
after committing it.

Sounds like a request throttle!

This is such an important thing for a production environment - I've been thinking about how good it would be for some time now! It's really great that you've had a crack at it so I can't wait to try it out...

How would it determine which requests to drop? If a user has 6 requests running and requests another 6 before the 10 second timeout, will the first 6 be dropped, the second 6, or would the second 6 be queued until the first 6 hit the 10 second timeout?

Regards,

Miles

___________________________________________________________________________

    Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________

Hi all,

I just shot through an email to Jody regarding what help is needed at FOSS4G 2010 with Geoserver (well, OpenGEO). I'm just trying to find out if I can help out in some way, i.e. running workshops, manning the stall, etc, as I don't think I'll be able to get enough funding for registration. I should be able to get there though.

Has anyone submitted any workshops to FOSS4G yet? Jody mentioned that registration closes this Saturday.

Or any other ideas as to how I could help out?

Regards,

Miles

Jody Garnett wrote:

Do you want to take this to the geoserver-devel list? There is a
workshop deadline this saturday so discussion now would be good.

I am not on the FOSS4G committee this year so I don't have any direct
advice for how to help out.

Jody

Miles Jordanwrote:

Is there any way that I can help out for FOSS4G 2010? I'm pretty
sure I will only get approval for enough funding for flights and
accommodation (I hope), but probably not enough for registration too.
It was so good last time, I'd hate to miss it.

I could help run geoserver workshops? Man the OpenGeo stall for a
while? Open to ideas! Please let me know if you think something like
that is at all possible.

Regards,

Miles

Regards,

Miles

___________________________________________________________________________

    Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not the
intended recipient, you are notified that use or dissemination of this communication is
strictly prohibited by Commonwealth law. If you have received this transmission in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232 3209 and
DELETE the message.
        Visit our web site at http://www.antarctica.gov.au/
___________________________________________________________________________

Jody Garnett ha scritto:

On Wed, Jan 27, 2010 at 9:36 PM, Andrea Aime <aaime@anonymised.com> wrote:

What happens is that Firefox does right away 6 requests in parallel
against GeoServer, then you change zoom, it drops the old requests
and makes another 6, and so on every time you change the current
zoom. The fact that firefox drops the request is not getting notified
on the server side until we try to write out to the output, which
might happen after quite some time.

Could we "start" writing somehow - such as filling in the headers or
something? even if we know what kind of file format is being returned
we may be able to start writing out the initial first couple of
bytes...

Already tried, does not work, containers cache what you
write to improve performance and avoid sending almost empty
packets towards the net.

Soo... ok to commit? I am PSC and I could give myself the +1,
but that would not be too nice :wink:

+1 I think you are okay to commit the community module (what an
amazing contribution to accomplish with a community module).

When I think control flow I actually expect something more like
process management then simply putting "per user caps" on the services
published. But you are writing it so you get to name it....

It is process/thread management. It is just implemented with the
threads the container gives us as opposed to rolling an extra
pool of them.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Miles Jordan ha scritto:

I'll put togheter a page describing the design and usage of the
module after committing it.

Sounds like a request throttle!

This is such an important thing for a production environment - I've
been thinking about how good it would be for some time now! It's
really great that you've had a crack at it so I can't wait to try it
out...

Module committed, let me know how it works for you.

How would it determine which requests to drop? If a user has 6
requests running and requests another 6 before the 10 second timeout,
will the first 6 be dropped, the second 6, or would the second 6 be
queued until the first 6 hit the 10 second timeout?

The module is similar to a linear petri net (http://en.wikipedia.org/wiki/Petri_net),
each rule is represented as a place allowing only N tokens to be there,
the other requests are queued trasparently by a
java.concurrent.BlockingQueue.

Pro: ease of implementation
Cons: you don't have control on what is blocked

So basically I get to know it took too much time to pass through the
various queues making the Petri net only after we've unblocked
from one of them and ready to transition into the next.

The requests are parsed before entering the net, and start actually
executing only after they are out of it. After execution they remove
all the tokens they left in the net (this is where they are different
from an actual petri net) leaving space for others to move forward.

The idea of dropping the oldest requests as soon as they time
out crossed my mind, but the implementation would be more complex.
Anyways, since the requests are queued fairly the oldest will be
the next ones to be freed when a similar request completes,
if they are too old they will be dropped immediately leaving space
for the next one and so on. So in practice it should work well
enough anyways.

Looking forward to your review and further suggestions

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.