[Geoserver-devel] An interesting performance war

Hi,

a bit unrelated but still an interesting story to tell.

Last week and the one before I've been tunning the binary xml encoder for
maximum performance, something we often don't even need to do because it
seems just good enough.
Well, in this case it was a crucial QA factor, so Yourkit Java profiler
revealed a couple interesting things.

First an advice: if you're looking for a well performing pieace of code,
frequent little objects allocation is your enemy. Nothing new, but how true.

Now the story:
I had a new XML push like API for streamed writing of xml data structures
where content (ie, coordinate lists) can be stored directly as double (or
other primitive data types) arrays.
And since its an API and multiple implementations are allowed, I got
fascinated by the benefits of Design By Contract Programming[1]. Basically
because it allows implementation code to focus just on the "success
scenario", and the platforms performs all the method pre and post condition
and class invariant checks, among helping to improve the design and
testability of the code.
Okay, but since the Java language does not provides native DBC support (some
would like to introduce it, but it would break just too much existing code...
code that if were stuck to its contract wouldn't break, but what the heck).
The point is there are a couple interesting libraries out there that let you
specify the contracts declaratively as annotations or javadoc tags, and then
provides the contract enforcement out of the box for any implementation of
your interface. Approaches goes from plain hand made decorators to AOP and
bytecode instrumentation a'la Hibernate. But none of them convinced me enough
so I went for a custom decorator.
Okay, that's the scenario, an interface, an implementation, and a factory
decorating implementations with a contract enforcement wrapper.

I first tunned the implementation, quite nicely done, using java nio, plain
streaming for binary data structures (like in being able to write an ordinate
at a time to encode a coordinate sequence, rather than converting to a
double and the like).
Tunning the implementation was fun and Yourkit very helpful, like in getting
rid of the synchronized java.util.Stack class in favor of a (well
encapsulated) ArrayList, provided by itself a 15% speed up, etc etc.
With the implementation tunned and one of my arquetypical test data layers,
got a binary gml encoding throughput of ~30MB/s, lowering down the total GML
encoding time of more than a minute to 6 seconds (a 12x factor).

Now it was just a matter of letting the factory wrap the implementation with
the contract enforcement decorator and done. Except it strangely degraded
performance over time. Line in encoding started at 30M/s and ended at less
than 8.
The Yourkit memory telemetry showed the memory compsumption getting lower over
time, not higher, but he, the garbage collector running with increasing
frequence.
An indication of young generation stress. Okay, I may tune the GC through JVM
arguments, but something keeps smelling bad, I want the code to perform good
enough with default settings, letting GC optimization for special cases, if
needed at all.
But where the heck were that many little objects being allocated and discarded
so frequently? I coded the decorator carefully to impose the less overhead
possible, were almost not allocating new objects on it. Obviously not that
true, I was using Java5 varags for the assertPre(boolean, ...) and
assertPost(boolean, ...). The variable length argument meant to avoid string
concatenation for exception message construction.. but it was creating an
Object under the hood, and this methods were being calling just too much
times.
In the end, simply replacing the variable length argument methods by
overloaded versions with fixed argument lists brought me back to business,
the wrapper proved to impose a negligible overhead, and lesson learnt.

Hope to be having that kind of fun with the rest of geoserver soon,
measurements at hand, and a scalability/robustness improvement plan.

Cheers,

Gabriel

[1]<http://en.wikipedia.org/wiki/Design_by_contract&gt;

Gabriel Roldán ha scritto:
...

Tunning the implementation was fun and Yourkit very helpful, like in getting rid of the synchronized java.util.Stack class in favor of a (well encapsulated) ArrayList, provided by itself a 15% speed up, etc etc.
With the implementation tunned and one of my arquetypical test data layers, got a binary gml encoding throughput of ~30MB/s, lowering down the total GML encoding time of more than a minute to 6 seconds (a 12x factor).

Now it was just a matter of letting the factory wrap the implementation with the contract enforcement decorator and done. Except it strangely degraded performance over time. Line in encoding started at 30M/s and ended at less than 8.
The Yourkit memory telemetry showed the memory compsumption getting lower over time, not higher, but he, the garbage collector running with increasing frequence.
An indication of young generation stress. Okay, I may tune the GC through JVM arguments, but something keeps smelling bad, I want the code to perform good enough with default settings, letting GC optimization for special cases, if needed at all.
But where the heck were that many little objects being allocated and discarded so frequently? I coded the decorator carefully to impose the less overhead possible, were almost not allocating new objects on it. Obviously not that true, I was using Java5 varags for the assertPre(boolean, ...) and assertPost(boolean, ...). The variable length argument meant to avoid string concatenation for exception message construction.. but it was creating an Object under the hood, and this methods were being calling just too much times.
In the end, simply replacing the variable length argument methods by overloaded versions with fixed argument lists brought me back to business, the wrapper proved to impose a negligible overhead, and lesson learnt.

Wow, that's indeed an interesting story of how one can kill the performance of a java application with a seemingly candid and un-harmful
language construct.
The lesson seems to be:
* stay away from frequent small object creation (Coordinate I'm
   looking at you!)
* function(Object...) considered harmful in tight loops
* frequent young gc, memory usage dropdown and perf slowdown can be
   reconducted to excessive small object creation (now that's
   interesting and counter intuitive, you end up using less memory
   because of excessive allocation!)

Thanks for sharing!
Cheers
Andrea