[Geoserver-devel] Timeout and memory limit patches applied to 1.7.x and trunk, plus some questions

Hi,
if anybody is interested in trying them out, I've applied
the above two patches on both branches.
If you want to test them in 1.7.x have a look at these
jira for details on how to enable them, if you're playing
on trunk just go to the WMS configuration and use the UI.
http://jira.codehaus.org/browse/GEOS-3085
http://jira.codehaus.org/browse/GEOS-3086

The limits have been applied on raster output only, since
they are the ones where the two are the most important
and easier to control, too.

There are other WMS outputs: KML, PDF, SVG, Atom, RSS.

KML, SVG, RSS are streaming, so memory control does not
seem to make much sense. I'm dubious as timeout is concerned
as well, since in these formats the time it takes to
execute the command includes streaming out the result,
which is dependent on the client download speed.
I would say, let's not apply any limit on those.

PDF and SVG are harder to deal with. They both
build the response in memory before sending it, but
it's not possible to estimate how big it will be.
Time wise, it's possible to determine a rendering
time for both, and it's typically a long longer
than raster rendering. Not sure if we want to
limit it?
Gabriel suggested that for these formats we could
introduce a max number of features rendered. I could
go for that, but wouldn't the limit have to be global
(that is, include raster rendering as well?).
And how would a client know that the limit is
surpassed?

Opinions?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

well done Andrea.

Your ideas about which formats they apply to seems to reflect a
variety of different reasons for doing this work - illustrating
perhaps how important it is.

These reasons include:

1) making sure the server is robust - doesnt fail with OOM or
something and deliver the wrong thing or fail to deliver legitimate
requests
2) protection against DOS (though I think this needs to happen at a
different level )
3) stopping one user breaking the ability to serve others
4) sharing the resources between different requests to gracefully
degrade in performance
5) protecting the client from doing something silly (this is the role
of maxfeatures)

Some formats - particualr those which are streamable, are logically
consistent with a large transfer that may be allowed to take some time
and be throttled to share resources around. Others, like raster
formats only really make sense as a whole. So, I think your solution
sounds like a very good pragmatic one.

I'd be keen to have your analysis of the different reasons (what have
I missed) and how the various available solutions stacks up against
these. This ought to become a "whitepaper" or tutorial or something,
but IMHO having this in an accessible form makes the product look a
lot more interesting from a risk-averse deployer's perspective.

Rob

On Thu, May 28, 2009 at 11:22 PM, Andrea Aime <aaime@anonymised.com> wrote:

Hi,
if anybody is interested in trying them out, I've applied
the above two patches on both branches.
If you want to test them in 1.7.x have a look at these
jira for details on how to enable them, if you're playing
on trunk just go to the WMS configuration and use the UI.
http://jira.codehaus.org/browse/GEOS-3085
http://jira.codehaus.org/browse/GEOS-3086

The limits have been applied on raster output only, since
they are the ones where the two are the most important
and easier to control, too.

There are other WMS outputs: KML, PDF, SVG, Atom, RSS.

KML, SVG, RSS are streaming, so memory control does not
seem to make much sense. I'm dubious as timeout is concerned
as well, since in these formats the time it takes to
execute the command includes streaming out the result,
which is dependent on the client download speed.
I would say, let's not apply any limit on those.

PDF and SVG are harder to deal with. They both
build the response in memory before sending it, but
it's not possible to estimate how big it will be.
Time wise, it's possible to determine a rendering
time for both, and it's typically a long longer
than raster rendering. Not sure if we want to
limit it?
Gabriel suggested that for these formats we could
introduce a max number of features rendered. I could
go for that, but wouldn't the limit have to be global
(that is, include raster rendering as well?).
And how would a client know that the limit is
surpassed?

Opinions?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Rob Atkinson ha scritto:

well done Andrea.

Your ideas about which formats they apply to seems to reflect a
variety of different reasons for doing this work - illustrating
perhaps how important it is.

These reasons include:

1) making sure the server is robust - doesnt fail with OOM or
something and deliver the wrong thing or fail to deliver legitimate
requests
2) protection against DOS (though I think this needs to happen at a
different level )
3) stopping one user breaking the ability to serve others
4) sharing the resources between different requests to gracefully
degrade in performance
5) protecting the client from doing something silly (this is the role
of maxfeatures)

Some formats - particualr those which are streamable, are logically
consistent with a large transfer that may be allowed to take some time
and be throttled to share resources around. Others, like raster
formats only really make sense as a whole. So, I think your solution
sounds like a very good pragmatic one.

I'd be keen to have your analysis of the different reasons (what have
I missed) and how the various available solutions stacks up against
these.

The reason was what you said, I wanted to prevent a malicious
(or clumsy) user to use up all the resources of the server in a way
that would have prevented other users to use it as well.
The three checks proposed (memory, time, number of rendering errors
tolerated while drawing a raster WMS output) go all in that direction
and provide _some_ relief.

They are by no means complete thought, other actions would be needed.

GeoServer is very featureful. You can pretty much couple every
layer with every style in WMS request, or send down your own styles,
or ask for an especially expensive output format (SVG, PDF).
This gives attackers lots of tools, and newbies quite some rope to
hang themselves with.
In the long run this should be tweaked so that you can disable output
formats, allow usage only of the styles registered against a layer,
and disallow user provided styles.
It is also my hope that GEOXACML will provide us means to put
the above under control, so that only certain users will be allowed
to use custom styles and the like (when you don't need the extra
features it's a good idea to disable them, but when you need them,
you still don't want everybody to be able and use them).

As you said, GeoServer alone won't be able to effectively defend
itself. The above limits make it hard to kill GeoServer with
a single fatal blow, but they don't prevent death by a million
cuts: network level appliances should make sure no single IP is
making too many requests against the server.

For streamable content in WMS we could definitely add feature counts
limits just like in WFS.

PDF and SVG are kind of unfortunate formats, as I have no
good way to predict memory usage beforehand, and the libraries
we're using to generate them do not allow any kind of size
control either. Time control remains, as the encoding
and write out parts are separated, but I'm not so sure
we want to impose the same time limits as the raster
outputs.

Anyways, one step at a time. I encourage everybody interested
in the topic to contribute

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime ha scritto:

PDF and SVG are kind of unfortunate formats, as I have no
good way to predict memory usage beforehand,

Hum... forget about what I said of PDF, I found a way to
make some checks on memory usage during PDF generation.
They are not accurate, but better than nothing, especially
since PDF, like SVG, tends to accumulate memory in small
chunks, which is something that brings the VM on his knees
when approaching the max heap limit (the VM is furiously
GC-ing in order to be able and allocate the extra 10kb
needed, it gets hard to even attach to it heap viewers
such as visualvm).

This leaves SVG out, which builds a DOM in memory.
We could rewrite the output format to stream out, Chris
has been dreaming about this for quite some time...
but it's no small task...

Alternatively, we could fine a way for the SVG DOM
to overflow on disk instead of accumulating in memory.
Something like this: http://au.geocities.com/caddydc/,
too bad the thing seems quite dead and licensing is
not clear.

Well, if you don't need SVG output, the admin can set
WMS to use the streaming SVG generator, it is not
style capable, but it's streaming...

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.