[Geoserver-devel] GeoServer Rendering Enhancement

Hi All,

I would like to propose an enhancement to the rendering within GeoServer.

With the recent release, Geoserver now be default uses StreamingRenderer, as oppose to previously when ShapefileRenderer was the default renderer.

The good thing about the ShapefileRenderer, is that it made a call to StreamingRenderer. Due to the architecture of ShapefileRenderer, we were able to make the method which calls StreamingRenderer multithreaded, so that if multiple WMS layers are called (For example in OS On Demand there is an OS MasterMap product which contains 6 layers: TOPOGRAPHICAREA, TOPOGRPAHICLINE, BOUNDARYLINE, CARTOGRPAHICSYMBOL, TOPOGRPAHICPOINT AND CARTOGRAPHICTEXT) using the parallel approach, we have seen tremendous improvements in performance compared to when the ShapefileRenderer uses a single thread for rendering all 6 layers at the same time.

I understand that ShapefileRenderer is now in an unsupported state, so maybe we can migrate this to be a WMSRenderer as I’m not sure of the different use cases away from WMS requests.

Thanks,

Steve

On Wed, Feb 2, 2011 at 10:04 AM, Steve Way
<Steve.Way@anonymised.com> wrote:

Hi All,

I would like to propose an enhancement to the rendering within GeoServer.

With the recent release, Geoserver now be default uses StreamingRenderer, as
oppose to previously when ShapefileRenderer was the default renderer.

The good thing about the ShapefileRenderer, is that it made a call to
StreamingRenderer. Due to the architecture of ShapefileRenderer, we were
able to make the method which calls StreamingRenderer multithreaded, so that
if multiple WMS layers are called (For example in OS On Demand there is an
OS MasterMap product which contains 6 layers: TOPOGRAPHICAREA,
TOPOGRPAHICLINE, BOUNDARYLINE, CARTOGRPAHICSYMBOL, TOPOGRPAHICPOINT AND
CARTOGRAPHICTEXT) using the parallel approach, we have seen tremendous
improvements in performance compared to when the ShapefileRenderer uses a
single thread for rendering all 6 layers at the same time.

I have reservations about this approach but we can discuss it.
First off, the StreamingRenderer is now internally threaded, with one
thread reading
from the data sources and a second one painting.

I can see how rendering data in parallel might improve response times
for the single
user case, but in general I believe this will be detrimental for a
number of reasons
(I'll be happy to be proven wrong though, been banging my head on
rendering speed
for a while and help is sure welcomed).
Why I think having blinding multi-rendering is going to be more
trouble than an improvement:
- in order to render n layers in parallel you need to allocate n
drawing surfaces, with
  all but the first one 4 byte deep. This makes the memory usage uncontrollable,
  there should be some limits otherwise this approach will take GS to
OOM in no time
  Btw, this would also break the wms rendering size
  limits, as the memory to be used is computed taking into account the
normal behavior
  of the streaming renderer
- years ago I was contracted to look into uDig trashing the entire
operating system
  while trying to just render a map made of 20-some shapefiles.
Looking into it I
  discovered uDig was rendering exactly as you suggest, one thread per layer,
  and the file system layer was completely thrashed due to that.
  This kind of consideration is quite common and it's by no means limited to
  the file system. On the file system you have the disk head jump around like
  crazy if you access too many files in parallel, on a database you would end up
  using n separate connections (one per layer coming from a db) and thus would
  starve the connection pool, if the server is a remote one you would break the
  common rule of not using more than 2-6 parallel accesses (something browsers
  still abide to)
- java2d rendering has a serious scalability problem that becomes evident once
  you start using many threads rendering in parallel. It seems that Oracle has
  made the bug report in the java database inaccessible but there is still
  some threads about me complaning:
  http://www.java.net/forum/topic/javadesktop/java-desktop-technologies/java-2d/poor-java2d-scalability-server-applicatio-0
  The bug is the reason why none of the java based servers could get good
  numbers above 4-8 concurrent clients in the FOSS4G 2010 wms shootout:
  the global synchronizations inside the antialiased rendering make it
impossible
  to have more than 2-4 threads actually active inside the antialiased
rasterization
  path (this is why I went for parallelizing data loading and
rendering instead).
- in the past months I actually made a module called "control-flow"
whose purpose
  is to limit parallel executing request in order to maximize throughput and
  stability under high load. The GS instance where I installed it
became quite a bit
  more stable and were able to serve more requests per day.

Long story short, I'm pretty sure you can get improvements in the
single user/few layers
case (few layers -> single digit) but in order to make that approach
work in the general
care a lot of care must be taken. The following ideas should be taken
into account
when designing a layer parallel renderer for server side usage:
- limit the number of parallel rendering tasks so that the memory usage does not
  go above the wms rendering limits
- limit the parallel rendering working by "zones", so that we cannot
have 10 layers
  all hitting the same database in parallel, or we are going to starve
the connection
  pool (this is actually something the admin should be able to
configure, if you have
  a connection pool of 20-40 connections better never have more than
2, otherwise
  bye bye serving concurrent clients)
- make sure anyways the global count of rendering threads does not
exceed a configurable
  "n" (wms performance shootout suggest n = 8 due to the java2d
scalability issue)
  to avoid having the jvm just sit and wait for the java2d
synchronized blocks to
  be freed

The above is actually the reason why I did not try to pursue this
design in the past,
getting it right is relatively hard and it only helps on light loads,
while it makes everything
way easier to break under high load. Given GS is a server side
application, one that
is meant to serve multiple requests in parallel, I left the fully
parallelized approach
to desktop apps (where it makes a lot more sense, but still should be guarded to
avoid chewing up all the machine resources on non toy maps).

Given infinite resources the highest priority right now would be to write a new
antialiased rasterizer that does not have Sun JDK scalablility issue (the Ductus
rasterizer) nor the OpenJDK speed issues (the Pisces rasterizer).
That's however quite a hard task...

Cheers
Andrea

--
Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

On Wed, Feb 2, 2011 at 11:11 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

The above is actually the reason why I did not try to pursue this
design in the past,
getting it right is relatively hard and it only helps on light loads,
while it makes everything
way easier to break under high load. Given GS is a server side
application, one that
is meant to serve multiple requests in parallel, I left the fully
parallelized approach
to desktop apps (where it makes a lot more sense, but still should be guarded to
avoid chewing up all the machine resources on non toy maps).

Meh, as usual I end up sounding a lot more negative than I intend to.
If we can get a speed improvements on the lighter loads I'm all for it,
we just need to address it very carefully so that all the work that has
been done in the past to ensure proper resource usage (and thus,
stability) is not undermined by the newest changes

Cheers
Andrea

--
Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------

On Wed, Feb 2, 2011 at 3:08 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Wed, Feb 2, 2011 at 11:11 AM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

The above is actually the reason why I did not try to pursue this
design in the past,
getting it right is relatively hard and it only helps on light loads,
while it makes everything
way easier to break under high load. Given GS is a server side
application, one that
is meant to serve multiple requests in parallel, I left the fully
parallelized approach
to desktop apps (where it makes a lot more sense, but still should be guarded to
avoid chewing up all the machine resources on non toy maps).

Meh, as usual I end up sounding a lot more negative than I intend to.
If we can get a speed improvements on the lighter loads I'm all for it,
we just need to address it very carefully so that all the work that has
been done in the past to ensure proper resource usage (and thus,
stability) is not undermined by the newest changes

And following up with a more practical point of view, what would it take
to make such renderer without the risk of getting GS down?
That renderer would need:
- a way to estimate the max number of threads allowed to render in
  parallel without breaking the memory limits (that sets the overall thread
  pool size for the rendering operation)
- a way to tell where each data source is coming from, so that we
  can make a "file system" zone, a "database x" zone, a "database y" zone
  and so on, and ensure no more than x threads from the same rendering
  request hit it at the same time (this can be done, for example, by giving
  each zone a bounded queue and have the threads put or get a token in
  the queue before starting). This might be somewhat more difficult as
  the map context only contains feature collections, so I guess we'd have
  to use custom map layer subclasses that are marked with the respective
  zone
- also set a max number of threads that can work globally (so a global
  thread pool)

I guess that for starters the zone limits and the global thread pool size
could be set by environment variable, this would make for a self contained
rendering class that could be coded and enabled with yet another
system wide variable that allows the brave to try it out.
If it proves not to undermine the stability and througput of GS under load
we can adopt it and turn all the system variables into actual tunables
exposed to the administrator

Performance wise there is also the tradeoff of merging the rendering surfaces:
it takes time to merge togheter two rgba images and if you just render a few
vectors on them in parallel you might actually end up slowing down things....

Cheers
Andrea

--
Ing. Andrea Aime
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584962313
fax: +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------