On Wed, Feb 2, 2011 at 10:04 AM, Steve Way
<Steve.Way@anonymised.com> wrote:
Hi All,
I would like to propose an enhancement to the rendering within GeoServer.
With the recent release, Geoserver now be default uses StreamingRenderer, as
oppose to previously when ShapefileRenderer was the default renderer.
The good thing about the ShapefileRenderer, is that it made a call to
StreamingRenderer. Due to the architecture of ShapefileRenderer, we were
able to make the method which calls StreamingRenderer multithreaded, so that
if multiple WMS layers are called (For example in OS On Demand there is an
OS MasterMap product which contains 6 layers: TOPOGRAPHICAREA,
TOPOGRPAHICLINE, BOUNDARYLINE, CARTOGRPAHICSYMBOL, TOPOGRPAHICPOINT AND
CARTOGRAPHICTEXT) using the parallel approach, we have seen tremendous
improvements in performance compared to when the ShapefileRenderer uses a
single thread for rendering all 6 layers at the same time.
I have reservations about this approach but we can discuss it.
First off, the StreamingRenderer is now internally threaded, with one
thread reading
from the data sources and a second one painting.
I can see how rendering data in parallel might improve response times
for the single
user case, but in general I believe this will be detrimental for a
number of reasons
(I'll be happy to be proven wrong though, been banging my head on
rendering speed
for a while and help is sure welcomed).
Why I think having blinding multi-rendering is going to be more
trouble than an improvement:
- in order to render n layers in parallel you need to allocate n
drawing surfaces, with
all but the first one 4 byte deep. This makes the memory usage uncontrollable,
there should be some limits otherwise this approach will take GS to
OOM in no time
Btw, this would also break the wms rendering size
limits, as the memory to be used is computed taking into account the
normal behavior
of the streaming renderer
- years ago I was contracted to look into uDig trashing the entire
operating system
while trying to just render a map made of 20-some shapefiles.
Looking into it I
discovered uDig was rendering exactly as you suggest, one thread per layer,
and the file system layer was completely thrashed due to that.
This kind of consideration is quite common and it's by no means limited to
the file system. On the file system you have the disk head jump around like
crazy if you access too many files in parallel, on a database you would end up
using n separate connections (one per layer coming from a db) and thus would
starve the connection pool, if the server is a remote one you would break the
common rule of not using more than 2-6 parallel accesses (something browsers
still abide to)
- java2d rendering has a serious scalability problem that becomes evident once
you start using many threads rendering in parallel. It seems that Oracle has
made the bug report in the java database inaccessible but there is still
some threads about me complaning:
http://www.java.net/forum/topic/javadesktop/java-desktop-technologies/java-2d/poor-java2d-scalability-server-applicatio-0
The bug is the reason why none of the java based servers could get good
numbers above 4-8 concurrent clients in the FOSS4G 2010 wms shootout:
the global synchronizations inside the antialiased rendering make it
impossible
to have more than 2-4 threads actually active inside the antialiased
rasterization
path (this is why I went for parallelizing data loading and
rendering instead).
- in the past months I actually made a module called "control-flow"
whose purpose
is to limit parallel executing request in order to maximize throughput and
stability under high load. The GS instance where I installed it
became quite a bit
more stable and were able to serve more requests per day.
Long story short, I'm pretty sure you can get improvements in the
single user/few layers
case (few layers -> single digit) but in order to make that approach
work in the general
care a lot of care must be taken. The following ideas should be taken
into account
when designing a layer parallel renderer for server side usage:
- limit the number of parallel rendering tasks so that the memory usage does not
go above the wms rendering limits
- limit the parallel rendering working by "zones", so that we cannot
have 10 layers
all hitting the same database in parallel, or we are going to starve
the connection
pool (this is actually something the admin should be able to
configure, if you have
a connection pool of 20-40 connections better never have more than
2, otherwise
bye bye serving concurrent clients)
- make sure anyways the global count of rendering threads does not
exceed a configurable
"n" (wms performance shootout suggest n = 8 due to the java2d
scalability issue)
to avoid having the jvm just sit and wait for the java2d
synchronized blocks to
be freed
The above is actually the reason why I did not try to pursue this
design in the past,
getting it right is relatively hard and it only helps on light loads,
while it makes everything
way easier to break under high load. Given GS is a server side
application, one that
is meant to serve multiple requests in parallel, I left the fully
parallelized approach
to desktop apps (where it makes a lot more sense, but still should be guarded to
avoid chewing up all the machine resources on non toy maps).
Given infinite resources the highest priority right now would be to write a new
antialiased rasterizer that does not have Sun JDK scalablility issue (the Ductus
rasterizer) nor the OpenJDK speed issues (the Pisces rasterizer).
That's however quite a hard task...
Cheers
Andrea
--
Ing. Andrea Aime
Technical Lead
GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584962313
fax: +39 0584962313
http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf
-----------------------------------------------------