[Geoserver-devel] Short term plans to increase GeoServer robustness in WMS

Hi,
lately at OpenGeo we've been having some troubles keeping up
a few WMS demos due to exceptional load. Looking into them
it's easy to see that our WMS does not defend itself from
too high workload, as I've reported in a jira almost
2 years ago:
http://jira.codehaus.org/browse/GEOS-1127

I've been given some time to try and provide a few solutions
that can be landed in GeoServer 1.7.x series in order to make
putting out GeoServer WMS in the wild less of a concern.
Since we're talking of 1.7.x the changes have to be as
less invasive as possible, but the idea and the configuration
should be portable unchanged to trunk where we can find
a fuller, more extensive solution (time and funding permitting,
that is... if the above jira teaches anything, is that finding
resources to pull this off is harder than it would seem at
a first sight...).

Mail thread wise, I would suggest we stick on what
can be done on 1.7.x, since I have no mandate to do
a full fledged solution on trunk, but only to make simple
changes to 1.7.x.

If you feel the proposed solutions are not good for 1.7.x,
or are not good at all, just say so, I will stop my
attempt and we'll start waiting again for resources for
a fuller solution.
I also encourage anybody interested to start discussing in
a separate thread, so that we have a design
ready for estimates should anyone with funds be interested
in having it implemented.

The following are the items I'm thinking about for
the 1.7.x branch.

Memory usage
--------------------------------------------------------
A way to limit the memory used by each request. WMS
requests do use quite an amount of memory due to the
need of setting up the drawing surface, which is usually
width * height * 4 (4 bytes per pixel). So a 1024x1024
image sucks up 4MB of memory (this is the quite typical
4x4 GWC metatile).
If one is determined enough, and he has access to a big
enough dataset on the server side, he can make a request
with a custom style that will suck up 99% of the heap
without going into OOM itself, but making any other
legitimate request OOM.
Even without a big dataset, you can make a loop
of big enough requests and obtain the same effect.
Now, external tools can be used to throttle down too
many requests from a single host I think, but those
tools won't be able to asses the image size being
requested.

So one config item I would like to add is image size.
As per Gabriel suggestion in private mail, a x MB
per request cap seem to be a good one.
It would be a global WMS parameter, simple to check,
and I would like to land a patch for this in 1.7.x,
without adding the param to the UI, and add the UI
in trunk instead.

The parameter could be a new full fledged field,
or an entry in the metadata map. I would prefer the
former.

Time usage
-------------------------------------------------
A request taking too much time to execute is
no good.
If you look at WFS, this requirement has been
turned from the time to the feature count dimension,
and even in that case, we had to allow admins
to turn off bounds computation on the returned
feature collections because that single
thing could take minutes on big data sets.

WMS wise we could do the same, but in the
end you can take a lot of time due to many
features, or to a few gigantic ones.

Gabriel has provided a solution at the NY
sprint that involves setting up a thread pool
that executes the rendering, and that can be
timeout out on config (and that can be also limited
in terms of how many threads do actually perform
rendering).
I have some reservations on applying this kind
of solution on 1.7.x due to a couple of things:
- it always requires two threads per request,
   one provided by the container that is executing
   the http request, and another doing the actual
   rendering in the thread pool
- it changes the request is executed even when
   if the admin did not activate it

I was considering a lower tech solution involving
the usage of a timer. A timer is started before the
rendering starts with the timeout time as its delay.
If the rendering terminates within it, the timer
is just cancelled. If the timer is activated instead,
it calls the stop() method over the renderer, and
for good measure it also disposes of the graphics
the renderer was using so that coverage rendering
is killed as well.

Mind, this ends up using extra threads as well, but
the main path is unaltered, and if the option is not
enabled, the main path is not modified at all.

At that point, we can decide whether to throw a
service exception, or return the partially generated
image with some marker showing it timed out.
I would go for the former.

Configuration wise, I suggest we add a wms timeout
specified in seconds, and again, add only the config
option to 1.7.x, and provide a UI for it on trunk.

Number of rendering errors
--------------------------------------------------
The StreamignRenderer has been developed for a long
time having uDig as the use case.
One of the effects of this shows up in its
"best effort rendering", which means the renderer
skips features it cannot render and goes on.
Typical issues that may arise during rendering
are reprojection problems, invalid geometries,
but also data source connections suddenly being
severed.
In face of this, the renderer just keeps on going,
eventually wasting a lot of time handling exceptions.

I would like to add a max errors setting inside
the renderer. It was there once, and an error
counter is still available in the code, but most
of it has been removed.

This thing can also be implemented as a listener
too, yet listeners are kind of heavy in that they
are also informed of each feature rendered, not only
of errors.

Also, there is the also the thing that by implementing
timeouts we also make it impossible for this
"best effort rendering" to keep the cpu busy for more
than x seconds. Having this knob has its own merit
thought, as wasting time handling exception is
an expensive and useless way to burn CPU cycles.

Questions
-------------------------------------------------

Justin, to make sure, what's the effort involved
into adding an option to the configuration in a way
that it goes straight to the services without the
need to add it to the UI in 1.7.x?
I think it would require changing the xml reader/writer
classes, the involved ServiceInfo class, and that would
be it, assuming the patch goes down an grab it?
I guess if I use the metadata map I would not even need
to change the reader/writer classes or the ServiceInfo,
but only change the service code, right?

Conclusion
-------------------------------------------------

While there are other items in the checklist of a more solid
server (like disallowing customs styles, disabling certain
output formats) the above seem to strike the
best bang for the buck, and I believe I can implement them
in the time I've been given (16 hours, for the record).

Feedback welcomed
Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

I am +1 on all of the below, with emphasis on the timelimit.

If the XML causes overhead, could we use context variables in the 1.7.x series ?

And does anyone know if there is a logging setting to log the rendering errors ? We run into them quite a bit, and it *sucks* to not know what feature triggered it.

-Arne

Andrea Aime wrote:

Hi,
lately at OpenGeo we've been having some troubles keeping up
a few WMS demos due to exceptional load. Looking into them
it's easy to see that our WMS does not defend itself from
too high workload, as I've reported in a jira almost
2 years ago:
http://jira.codehaus.org/browse/GEOS-1127

I've been given some time to try and provide a few solutions
that can be landed in GeoServer 1.7.x series in order to make
putting out GeoServer WMS in the wild less of a concern.
Since we're talking of 1.7.x the changes have to be as
less invasive as possible, but the idea and the configuration
should be portable unchanged to trunk where we can find
a fuller, more extensive solution (time and funding permitting,
that is... if the above jira teaches anything, is that finding
resources to pull this off is harder than it would seem at
a first sight...).

Mail thread wise, I would suggest we stick on what
can be done on 1.7.x, since I have no mandate to do
a full fledged solution on trunk, but only to make simple
changes to 1.7.x.

If you feel the proposed solutions are not good for 1.7.x,
or are not good at all, just say so, I will stop my
attempt and we'll start waiting again for resources for
a fuller solution.
I also encourage anybody interested to start discussing in
a separate thread, so that we have a design
ready for estimates should anyone with funds be interested
in having it implemented.

The following are the items I'm thinking about for
the 1.7.x branch.

Memory usage
--------------------------------------------------------
A way to limit the memory used by each request. WMS
requests do use quite an amount of memory due to the
need of setting up the drawing surface, which is usually
width * height * 4 (4 bytes per pixel). So a 1024x1024
image sucks up 4MB of memory (this is the quite typical
4x4 GWC metatile).
If one is determined enough, and he has access to a big
enough dataset on the server side, he can make a request
with a custom style that will suck up 99% of the heap
without going into OOM itself, but making any other
legitimate request OOM.
Even without a big dataset, you can make a loop
of big enough requests and obtain the same effect.
Now, external tools can be used to throttle down too
many requests from a single host I think, but those
tools won't be able to asses the image size being
requested.

So one config item I would like to add is image size.
As per Gabriel suggestion in private mail, a x MB
per request cap seem to be a good one.
It would be a global WMS parameter, simple to check,
and I would like to land a patch for this in 1.7.x,
without adding the param to the UI, and add the UI
in trunk instead.

The parameter could be a new full fledged field,
or an entry in the metadata map. I would prefer the
former.

Time usage
-------------------------------------------------
A request taking too much time to execute is
no good.
If you look at WFS, this requirement has been
turned from the time to the feature count dimension,
and even in that case, we had to allow admins
to turn off bounds computation on the returned
feature collections because that single
thing could take minutes on big data sets.

WMS wise we could do the same, but in the
end you can take a lot of time due to many
features, or to a few gigantic ones.

Gabriel has provided a solution at the NY
sprint that involves setting up a thread pool
that executes the rendering, and that can be
timeout out on config (and that can be also limited
in terms of how many threads do actually perform
rendering).
I have some reservations on applying this kind
of solution on 1.7.x due to a couple of things:
- it always requires two threads per request,
   one provided by the container that is executing
   the http request, and another doing the actual
   rendering in the thread pool
- it changes the request is executed even when
   if the admin did not activate it

I was considering a lower tech solution involving
the usage of a timer. A timer is started before the
rendering starts with the timeout time as its delay.
If the rendering terminates within it, the timer
is just cancelled. If the timer is activated instead,
it calls the stop() method over the renderer, and
for good measure it also disposes of the graphics
the renderer was using so that coverage rendering
is killed as well.

Mind, this ends up using extra threads as well, but
the main path is unaltered, and if the option is not
enabled, the main path is not modified at all.

At that point, we can decide whether to throw a
service exception, or return the partially generated
image with some marker showing it timed out.
I would go for the former.

Configuration wise, I suggest we add a wms timeout
specified in seconds, and again, add only the config
option to 1.7.x, and provide a UI for it on trunk.

Number of rendering errors
--------------------------------------------------
The StreamignRenderer has been developed for a long
time having uDig as the use case.
One of the effects of this shows up in its
"best effort rendering", which means the renderer
skips features it cannot render and goes on.
Typical issues that may arise during rendering
are reprojection problems, invalid geometries,
but also data source connections suddenly being
severed.
In face of this, the renderer just keeps on going,
eventually wasting a lot of time handling exceptions.

I would like to add a max errors setting inside
the renderer. It was there once, and an error
counter is still available in the code, but most
of it has been removed.

This thing can also be implemented as a listener
too, yet listeners are kind of heavy in that they
are also informed of each feature rendered, not only
of errors.

Also, there is the also the thing that by implementing
timeouts we also make it impossible for this
"best effort rendering" to keep the cpu busy for more
than x seconds. Having this knob has its own merit
thought, as wasting time handling exception is
an expensive and useless way to burn CPU cycles.

Questions
-------------------------------------------------

Justin, to make sure, what's the effort involved
into adding an option to the configuration in a way
that it goes straight to the services without the
need to add it to the UI in 1.7.x?
I think it would require changing the xml reader/writer
classes, the involved ServiceInfo class, and that would
be it, assuming the patch goes down an grab it?
I guess if I use the metadata map I would not even need
to change the reader/writer classes or the ServiceInfo,
but only change the service code, right?

Conclusion
-------------------------------------------------

While there are other items in the checklist of a more solid
server (like disallowing customs styles, disabling certain
output formats) the above seem to strike the
best bang for the buck, and I believe I can implement them
in the time I've been given (16 hours, for the record).

Feedback welcomed
Cheers
Andrea

--
Arne Kepp
OpenGeo - http://opengeo.org
Expert service straight from the developers

Arne Kepp ha scritto:

I am +1 on all of the below, with emphasis on the timelimit.

If the XML causes overhead, could we use context variables in the 1.7.x series ?

We could, but it would result in inconsistencies with the 2.0.x
series configuration, so I'd like to avoid it.

And does anyone know if there is a logging setting to log the rendering errors ? We run into them quite a bit, and it *sucks* to not know what feature triggered it.

Actually rendering errors are logged at SEVERE level, which maps
to ERROR level in log4j, so they should show up in all logging
configurations.

Do you have an example of bad data I can test with?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime wrote:

Arne Kepp ha scritto:

I am +1 on all of the below, with emphasis on the timelimit.

If the XML causes overhead, could we use context variables in the 1.7.x series ?

We could, but it would result in inconsistencies with the 2.0.x
series configuration, so I'd like to avoid it.

Good point.

And does anyone know if there is a logging setting to log the rendering errors ? We run into them quite a bit, and it *sucks* to not know what feature triggered it.

Actually rendering errors are logged at SEVERE level, which maps
to ERROR level in log4j, so they should show up in all logging
configurations.

Do you have an example of bad data I can test with?

Cheers
Andrea

Not a reliable test case, unfortunately...

The place I observed it most frequently was about a year ago, the NYC data used on the front page of geoserver.org. (nyc_nj_bg layer group served by /geoserver on artois.openplans.org).

Sometimes it will simply render a metatile without the roads. Delete the cache and it renders it fine the next time. We scoured the log files several times, did not find anything.

I think Ivan reported something similar with the OSM layer last week. I did not investigate before installing JAI / wiping logs though.

-Arne

--
Arne Kepp
OpenGeo - http://opengeo.org
Expert service straight from the developers

All this sounds good Andrea. +1 from me. We should try to get these into jira asap so we can throw them on the road map and include them in the update for this week.

Additional comments inline.

<snip>

Time usage
-------------------------------------------------

<snip>

I have some reservations on applying this kind
of solution on 1.7.x due to a couple of things:
- it always requires two threads per request,
   one provided by the container that is executing
   the http request, and another doing the actual
   rendering in the thread pool
- it changes the request is executed even when
   if the admin did not activate it

I was considering a lower tech solution involving
the usage of a timer. A timer is started before the
rendering starts with the timeout time as its delay.
If the rendering terminates within it, the timer
is just cancelled. If the timer is activated instead,
it calls the stop() method over the renderer, and
for good measure it also disposes of the graphics
the renderer was using so that coverage rendering
is killed as well.

Do you see the timer as a longer term solution as well? Or when we do have a "job queue" do you see this being rolled in. I can see benefits to having both bits of functionality separate, or rolled into one.

<snip>

Questions
-------------------------------------------------

Justin, to make sure, what's the effort involved
into adding an option to the configuration in a way
that it goes straight to the services without the
need to add it to the UI in 1.7.x?
I think it would require changing the xml reader/writer
classes, the involved ServiceInfo class, and that would
be it, assuming the patch goes down an grab it?
I guess if I use the metadata map I would not even need
to change the reader/writer classes or the ServiceInfo,
but only change the service code, right?

So on 1.7.x what would be involved (still a bit of a pain to add config parameters unfortunately)

1) add any methods to WMSInfo or other config objects you need (optional if you decide to use the metadata map)

2) Modify LegacyServiceReader or just WMSServiceLoader (depending on which classes if any you touch) to read the parameters from the xml

3) Modify the old XMLConfigWriter to persist the configuration

4) Flush out any bugs in the current pipeline of metadata properties being lost (if you go that route). Remember that when the UI is saved all the objects are still wiped out, there could be a couple of issues hiding around where metadata entries re not "round tripped"

On trunk in theory step 1) should be the only one necessary.

Conclusion
-------------------------------------------------

While there are other items in the checklist of a more solid
server (like disallowing customs styles, disabling certain
output formats) the above seem to strike the
best bang for the buck, and I believe I can implement them
in the time I've been given (16 hours, for the record).

Feedback welcomed
Cheers
Andrea

--
Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Justin Deoliveira ha scritto:

<snip>

Time usage
-------------------------------------------------

<snip>

I have some reservations on applying this kind
of solution on 1.7.x due to a couple of things:
- it always requires two threads per request,
   one provided by the container that is executing
   the http request, and another doing the actual
   rendering in the thread pool
- it changes the request is executed even when
   if the admin did not activate it

I was considering a lower tech solution involving
the usage of a timer. A timer is started before the
rendering starts with the timeout time as its delay.
If the rendering terminates within it, the timer
is just cancelled. If the timer is activated instead,
it calls the stop() method over the renderer, and
for good measure it also disposes of the graphics
the renderer was using so that coverage rendering
is killed as well.

Do you see the timer as a longer term solution as well? Or when we do have a "job queue" do you see this being rolled in. I can see benefits to having both bits of functionality separate, or rolled into one.

I think it can stay in the long term as well, as long
as we don't need to control how many threads each
service is allowed to use, or control their priority
over time (throttling down of long requests).
That is, the solution is simple, but it's not hacky imho,
so it can stay there until we can have resources for
a well rounded process management.
I'm also a bit nervous about having our own process
handling in that never creating your own threads
is one of the rules for big J2EE containers, thought
that was for entity/service beans, and was never
forbidden for pure servlets afaik.
I'm just worried that bigger containers like WebLogic/WebSphere
might have some stringent controls in place that would
trigger if we start keeping our own thread pools.
Timer does use threads as well, but that is so common
that I don't think it would be blocked.
Anyways, I'm just hypothesizing, we'll know better
when we go down that road.

<snip>

Questions
-------------------------------------------------

Justin, to make sure, what's the effort involved
into adding an option to the configuration in a way
that it goes straight to the services without the
need to add it to the UI in 1.7.x?
I think it would require changing the xml reader/writer
classes, the involved ServiceInfo class, and that would
be it, assuming the patch goes down an grab it?
I guess if I use the metadata map I would not even need
to change the reader/writer classes or the ServiceInfo,
but only change the service code, right?

So on 1.7.x what would be involved (still a bit of a pain to add config parameters unfortunately)

1) add any methods to WMSInfo or other config objects you need (optional if you decide to use the metadata map)

2) Modify LegacyServiceReader or just WMSServiceLoader (depending on which classes if any you touch) to read the parameters from the xml

3) Modify the old XMLConfigWriter to persist the configuration

4) Flush out any bugs in the current pipeline of metadata properties being lost (if you go that route). Remember that when the UI is saved all the objects are still wiped out, there could be a couple of issues hiding around where metadata entries re not "round tripped"

On trunk in theory step 1) should be the only one necessary.

Good, thanks for laying this out for me.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Sounds like a perfectly reasonable approach for 1.7.x to me Andrea. +1 on your proposed enhancements.

I'll keep from commenting on the other issues here for the sake of letting you better invest the time given, everything else can be talked about as it's own thread

Cheers,

Gabriel

Andrea Aime wrote:

Justin Deoliveira ha scritto:

<snip>

Time usage
-------------------------------------------------

<snip>

I have some reservations on applying this kind
of solution on 1.7.x due to a couple of things:
- it always requires two threads per request,
   one provided by the container that is executing
   the http request, and another doing the actual
   rendering in the thread pool
- it changes the request is executed even when
   if the admin did not activate it

I was considering a lower tech solution involving
the usage of a timer. A timer is started before the
rendering starts with the timeout time as its delay.
If the rendering terminates within it, the timer
is just cancelled. If the timer is activated instead,
it calls the stop() method over the renderer, and
for good measure it also disposes of the graphics
the renderer was using so that coverage rendering
is killed as well.

Do you see the timer as a longer term solution as well? Or when we do have a "job queue" do you see this being rolled in. I can see benefits to having both bits of functionality separate, or rolled into one.

I think it can stay in the long term as well, as long
as we don't need to control how many threads each
service is allowed to use, or control their priority
over time (throttling down of long requests).
That is, the solution is simple, but it's not hacky imho,
so it can stay there until we can have resources for
a well rounded process management.
I'm also a bit nervous about having our own process
handling in that never creating your own threads
is one of the rules for big J2EE containers, thought
that was for entity/service beans, and was never
forbidden for pure servlets afaik.
I'm just worried that bigger containers like WebLogic/WebSphere
might have some stringent controls in place that would
trigger if we start keeping our own thread pools.
Timer does use threads as well, but that is so common
that I don't think it would be blocked.
Anyways, I'm just hypothesizing, we'll know better
when we go down that road.

<snip>

Questions
-------------------------------------------------

Justin, to make sure, what's the effort involved
into adding an option to the configuration in a way
that it goes straight to the services without the
need to add it to the UI in 1.7.x?
I think it would require changing the xml reader/writer
classes, the involved ServiceInfo class, and that would
be it, assuming the patch goes down an grab it?
I guess if I use the metadata map I would not even need
to change the reader/writer classes or the ServiceInfo,
but only change the service code, right?

So on 1.7.x what would be involved (still a bit of a pain to add config parameters unfortunately)

1) add any methods to WMSInfo or other config objects you need (optional if you decide to use the metadata map)

2) Modify LegacyServiceReader or just WMSServiceLoader (depending on which classes if any you touch) to read the parameters from the xml

3) Modify the old XMLConfigWriter to persist the configuration

4) Flush out any bugs in the current pipeline of metadata properties being lost (if you go that route). Remember that when the UI is saved all the objects are still wiped out, there could be a couple of issues hiding around where metadata entries re not "round tripped"

On trunk in theory step 1) should be the only one necessary.

Good, thanks for laying this out for me.

Cheers
Andrea

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

I second your approach, but... see below...
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Owner - Software Engineer
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 333 8128928

http://www.geo-solutions.it
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Tue, May 26, 2009 at 10:19 AM, Andrea Aime <aaime@anonymised.com> wrote:

Hi,
lately at OpenGeo we've been having some troubles keeping up
a few WMS demos due to exceptional load. Looking into them
it's easy to see that our WMS does not defend itself from
too high workload, as I've reported in a jira almost
2 years ago:
http://jira.codehaus.org/browse/GEOS-1127

I've been given some time to try and provide a few solutions
that can be landed in GeoServer 1.7.x series in order to make
putting out GeoServer WMS in the wild less of a concern.
Since we're talking of 1.7.x the changes have to be as
less invasive as possible, but the idea and the configuration
should be portable unchanged to trunk where we can find
a fuller, more extensive solution (time and funding permitting,
that is... if the above jira teaches anything, is that finding
resources to pull this off is harder than it would seem at
a first sight...).

Mail thread wise, I would suggest we stick on what
can be done on 1.7.x, since I have no mandate to do
a full fledged solution on trunk, but only to make simple
changes to 1.7.x.

If you feel the proposed solutions are not good for 1.7.x,
or are not good at all, just say so, I will stop my
attempt and we'll start waiting again for resources for
a fuller solution.
I also encourage anybody interested to start discussing in
a separate thread, so that we have a design
ready for estimates should anyone with funds be interested
in having it implemented.

The following are the items I'm thinking about for
the 1.7.x branch.

Memory usage
--------------------------------------------------------
A way to limit the memory used by each request. WMS
requests do use quite an amount of memory due to the
need of setting up the drawing surface, which is usually
width * height * 4 (4 bytes per pixel). So a 1024x1024
image sucks up 4MB of memory (this is the quite typical
4x4 GWC metatile).
If one is determined enough, and he has access to a big
enough dataset on the server side, he can make a request
with a custom style that will suck up 99% of the heap
without going into OOM itself, but making any other
legitimate request OOM.
Even without a big dataset, you can make a loop
of big enough requests and obtain the same effect.
Now, external tools can be used to throttle down too
many requests from a single host I think, but those
tools won't be able to asses the image size being
requested.

So one config item I would like to add is image size.
As per Gabriel suggestion in private mail, a x MB
per request cap seem to be a good one.
It would be a global WMS parameter, simple to check,
and I would like to land a patch for this in 1.7.x,
without adding the param to the UI, and add the UI
in trunk instead.

The parameter could be a new full fledged field,
or an entry in the metadata map. I would prefer the
former.

Time usage
-------------------------------------------------
A request taking too much time to execute is
no good.
If you look at WFS, this requirement has been
turned from the time to the feature count dimension,
and even in that case, we had to allow admins
to turn off bounds computation on the returned
feature collections because that single
thing could take minutes on big data sets.

WMS wise we could do the same, but in the
end you can take a lot of time due to many
features, or to a few gigantic ones.

Gabriel has provided a solution at the NY
sprint that involves setting up a thread pool
that executes the rendering, and that can be
timeout out on config (and that can be also limited
in terms of how many threads do actually perform
rendering).
I have some reservations on applying this kind
of solution on 1.7.x due to a couple of things:
- it always requires two threads per request,
one provided by the container that is executing
the http request, and another doing the actual
rendering in the thread pool
- it changes the request is executed even when
if the admin did not activate it

I was considering a lower tech solution involving
the usage of a timer. A timer is started before the
rendering starts with the timeout time as its delay.
If the rendering terminates within it, the timer
is just cancelled. If the timer is activated instead,
it calls the stop() method over the renderer, and
for good measure it also disposes of the graphics
the renderer was using so that coverage rendering
is killed as well.

Mind, this ends up using extra threads as well, but
the main path is unaltered, and if the option is not
enabled, the main path is not modified at all.

At that point, we can decide whether to throw a
service exception, or return the partially generated
image with some marker showing it timed out.
I would go for the former.

Configuration wise, I suggest we add a wms timeout
specified in seconds, and again, add only the config
option to 1.7.x, and provide a UI for it on trunk.

I am a bit worried about trying to kill the rendering loop when native
libs for doing I/O on coverages and ad a Java2D call to
drawRenderedImage with a deferred loaded raster are involved.
We might be trying to kill a java thread while it is inside native
code doing I/O. Seeing a JVM crash would be not that unusual. Have you
thought about this?

Simone.

Number of rendering errors
--------------------------------------------------
The StreamignRenderer has been developed for a long
time having uDig as the use case.
One of the effects of this shows up in its
"best effort rendering", which means the renderer
skips features it cannot render and goes on.
Typical issues that may arise during rendering
are reprojection problems, invalid geometries,
but also data source connections suddenly being
severed.
In face of this, the renderer just keeps on going,
eventually wasting a lot of time handling exceptions.

I would like to add a max errors setting inside
the renderer. It was there once, and an error
counter is still available in the code, but most
of it has been removed.

This thing can also be implemented as a listener
too, yet listeners are kind of heavy in that they
are also informed of each feature rendered, not only
of errors.

Also, there is the also the thing that by implementing
timeouts we also make it impossible for this
"best effort rendering" to keep the cpu busy for more
than x seconds. Having this knob has its own merit
thought, as wasting time handling exception is
an expensive and useless way to burn CPU cycles.

Questions
-------------------------------------------------

Justin, to make sure, what's the effort involved
into adding an option to the configuration in a way
that it goes straight to the services without the
need to add it to the UI in 1.7.x?
I think it would require changing the xml reader/writer
classes, the involved ServiceInfo class, and that would
be it, assuming the patch goes down an grab it?
I guess if I use the metadata map I would not even need
to change the reader/writer classes or the ServiceInfo,
but only change the service code, right?

Conclusion
-------------------------------------------------

While there are other items in the checklist of a more solid
server (like disallowing customs styles, disabling certain
output formats) the above seem to strike the
best bang for the buck, and I believe I can implement them
in the time I've been given (16 hours, for the record).

Feedback welcomed
Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Simone Giannecchini ha scritto:

I am a bit worried about trying to kill the rendering loop when native
libs for doing I/O on coverages and ad a Java2D call to
drawRenderedImage with a deferred loaded raster are involved.
We might be trying to kill a java thread while it is inside native
code doing I/O. Seeing a JVM crash would be not that unusual. Have you
thought about this?

I'm not playing with threads, the ability to kill a thread
is deprecated anyways. But I'm disposition the Graphics2D
while rendering is going.
The JAI native code might be using it, so that might
cause issues. I'll double check.
Can you suggest a better way to stop coverage rendering
mid-flight? In the end what the coverage renderer
does, is to setup a JAI processing chain, and the chain
is set in motion by a single, atomic call to the
graphic context. How do you stop it?

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime ha scritto:

Simone Giannecchini ha scritto:

I am a bit worried about trying to kill the rendering loop when native
libs for doing I/O on coverages and ad a Java2D call to
drawRenderedImage with a deferred loaded raster are involved.
We might be trying to kill a java thread while it is inside native
code doing I/O. Seeing a JVM crash would be not that unusual. Have you
thought about this?

I'm not playing with threads, the ability to kill a thread
is deprecated anyways. But I'm disposition the Graphics2D

... I'm disposing the Graphics2D...

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime ha scritto:

Simone Giannecchini ha scritto:

I am a bit worried about trying to kill the rendering loop when native
libs for doing I/O on coverages and ad a Java2D call to
drawRenderedImage with a deferred loaded raster are involved.
We might be trying to kill a java thread while it is inside native
code doing I/O. Seeing a JVM crash would be not that unusual. Have you
thought about this?

I'm not playing with threads, the ability to kill a thread
is deprecated anyways. But I'm disposition the Graphics2D
while rendering is going.
The JAI native code might be using it, so that might
cause issues. I'll double check.
Can you suggest a better way to stop coverage rendering
mid-flight? In the end what the coverage renderer
does, is to setup a JAI processing chain, and the chain
is set in motion by a single, atomic call to the
graphic context. How do you stop it?

Oh btw, I've posted a (working I hope) patch on
http://jira.codehaus.org/browse/GEOS-3086
If you could apply it on 1.7.x and check it
with one of your big raster dataset it would
be much appreciated.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

see below...
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Owner - Software Engineer
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 333 8128928

http://www.geo-solutions.it
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Wed, May 27, 2009 at 10:36 PM, Andrea Aime <aaime@anonymised.com> wrote:

Simone Giannecchini ha scritto:

I am a bit worried about trying to kill the rendering loop when native
libs for doing I/O on coverages and ad a Java2D call to
drawRenderedImage with a deferred loaded raster are involved.
We might be trying to kill a java thread while it is inside native
code doing I/O. Seeing a JVM crash would be not that unusual. Have you
thought about this?

I'm not playing with threads, the ability to kill a thread
is deprecated anyways. But I'm disposition the Graphics2D
while rendering is going.

cool

The JAI native code might be using it, so that might
cause issues. I'll double check.
Can you suggest a better way to stop coverage rendering
mid-flight? In the end what the coverage renderer
does, is to setup a JAI processing chain, and the chain
is set in motion by a single, atomic call to the
graphic context. How do you stop it?

Though call. The variables are many it is not easy to come up with one
solutions that would fit all.
In principle killing the graphics should work, since that would mean
killing the sink of the jain, not the source (the native code itself).
As you know, JAI chains does not have themselves a stop button.
We could work that around by wrapping operations, but it's neither
easy nor simple.

Moreover your approach would work (raster-wise) only if deferred
loading is allowed.
In case, like for gdal, we avoid the ImageRead path and we down the
simple read path, you'd still might suffere from ethernal loading
times in case you have to load an enormous raster and then rescale
(just think about a large jp2k without many pyramid levels, which you
want to rescale to a small area. Add to this that it might be badly
tiled and you are f***d :slight_smile: ).

I guess that fighting against long running requests in a really robust
(raster-wise) would force us to push loading into an external process
which we can stop at our covenience without problems. For the time
being we can just try your approach and I can try to thinkabout
something different, but I doubt I can find anything better than what
you come up with.

Simone.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

sure, will do.

Simone.
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Owner - Software Engineer
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 333 8128928

http://www.geo-solutions.it
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Wed, May 27, 2009 at 10:46 PM, Andrea Aime <aaime@anonymised.com> wrote:

Andrea Aime ha scritto:

Simone Giannecchini ha scritto:

I am a bit worried about trying to kill the rendering loop when native
libs for doing I/O on coverages and ad a Java2D call to
drawRenderedImage with a deferred loaded raster are involved.
We might be trying to kill a java thread while it is inside native
code doing I/O. Seeing a JVM crash would be not that unusual. Have you
thought about this?

I'm not playing with threads, the ability to kill a thread
is deprecated anyways. But I'm disposition the Graphics2D
while rendering is going.
The JAI native code might be using it, so that might
cause issues. I'll double check.
Can you suggest a better way to stop coverage rendering
mid-flight? In the end what the coverage renderer
does, is to setup a JAI processing chain, and the chain
is set in motion by a single, atomic call to the
graphic context. How do you stop it?

Oh btw, I've posted a (working I hope) patch on
http://jira.codehaus.org/browse/GEOS-3086
If you could apply it on 1.7.x and check it
with one of your big raster dataset it would
be much appreciated.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Simone Giannecchini ha scritto:

sure, will do.

I've made some tests with MrSid and JP2 and haven't experienced
troubles so far (Nasa one the first, the usual world lossless gtopo30
the second).
One thing I noticed thought, as you predicted, is that the timeout
is not respected if the reading code takes a long time to read
the coverage.
I've noticed that setting the "suggested tile size" to 512,512
both gives a great speedup and seems to provide for a better
respect of the timeout (which makes me reiterate the importance
of having a good default for that parameter)

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

see below...
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Owner - Software Engineer
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 333 8128928

http://www.geo-solutions.it
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Thu, May 28, 2009 at 1:04 PM, Andrea Aime <aaime@anonymised.com> wrote:

Simone Giannecchini ha scritto:

sure, will do.

I've made some tests with MrSid and JP2 and haven't experienced
troubles so far (Nasa one the first, the usual world lossless gtopo30
the second).
One thing I noticed thought, as you predicted, is that the timeout
is not respected if the reading code takes a long time to read
the coverage.
I've noticed that setting the "suggested tile size" to 512,512
both gives a great speedup and seems to provide for a better
respect of the timeout (which makes me reiterate the importance
of having a good default for that parameter)

what can I say, you are right on this :slight_smile:

Besides, can yo ucheck what happens in case you go down the path of
direct read (imageread==false) with large requests?

Simone.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Simone Giannecchini ha scritto:

see below...
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Owner - Software Engineer
Via Carignoni 51
55041 Camaiore (LU)
Italy

phone: +39 0584983027
fax: +39 0584983027
mob: +39 333 8128928

http://www.geo-solutions.it
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Thu, May 28, 2009 at 1:04 PM, Andrea Aime <aaime@anonymised.com> wrote:

Simone Giannecchini ha scritto:

sure, will do.

I've made some tests with MrSid and JP2 and haven't experienced
troubles so far (Nasa one the first, the usual world lossless gtopo30
the second).
One thing I noticed thought, as you predicted, is that the timeout
is not respected if the reading code takes a long time to read
the coverage.
I've noticed that setting the "suggested tile size" to 512,512
both gives a great speedup and seems to provide for a better
respect of the timeout (which makes me reiterate the importance
of having a good default for that parameter)

what can I say, you are right on this :slight_smile:

Besides, can yo ucheck what happens in case you go down the path of
direct read (imageread==false) with large requests?

I've used both settings, did not crash in either case.
I did not try with a multithreaded load test thought.

Cheers
Andrea

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.