Doh, forgot one last bit that I’ve found interesting. Citing Gabriel:
If starting up geoserver in seconds instead of minutes, loading the
home page almost instantly instead waiting for seconds or even
minutes, under-second response times for the layers page list with
thousands of layers, including filtering and paging; not going OOM or
getting timeouts when doing a GetCapabilities request under
concurrency and/or low heap size, but instead streaming out as quickly
as possible, using as little memory as possible, and gracefully
degrading under load; are not ways of exercising the new API, then I’m
lost.
All good stuff that I did not see mentioned in the proposal, though
I can hardly imagine a GS going OOM under concurrent load of
GetCapabilities unless… well, maybe it has 200k layers and
works off just 256M of memory (gut feeling estimate).
In the past OOM in case of many layers is caused by leaks in the
DescribeFeatureType subsystem… did you measure how much memory
does it take for the keep the catalog in memory and do GetCapabilities?
Pardon the very rough way of assessing it, but if I we take as reference
the release directory we have 19 layers, with quite a bit of stores that
could be avoided (single shapefiles instead of directory stores), and
running inside “workspaces” the following gives me:
du -csh find . -name "*.xml"
256K total
which means, on average, 13KB of xml per layer (which still pads it quite
a bit since the service configuration is shared and normally you don’t
have so many stores). And then it’s XML, would you agree that the
in memory representation should be something like 5 times more compact?
This would give us a rough estimate of 3KB per layer.
If I have 200k layers it means 600MB of in memory storage.
Which is a lot, I’m not denying it, but if you are handling that many layers
you do also want to have some beefy hardware, 600MB should be peanuts.
I’m not trying to deny the scalability advantages of secondary storage, it just seems
to me the OOM reports may be a bit exaggerated.
The other thing that raises my interest and worries me is “starting up in seconds”.
My understanding of the current startup slowness is mostly due to the
validation checks we do on startup to see if a layer/feature type/store are working
and valid, that results in opening all stores, computing the feature types and so on.
Maybe the jdbc config is that much faster because it’s not loading everything up front
and thus those listeners are not being called?
If so we have a problem at hand, since the listeners are there to prevent the caps
documents to error miserably with a service exception at the first sign of trouble.
You may say that’s a design issue in the caps generator and I would agree, me
and Justin discussed it a bit during the FOSS4G-NA code sprint, we basically
can remove those checks if we can make the XML documents generation
“transactional” in some way, that is, put a mark on the ouput stream, generate
the xml for a layer, if it’s ok push it out, in case of exception throw away the
buffer and start back from the mark, and so on.
This would have to be made for all caps document, for rest-config parts that
do list resources, and for all Describe* calls (since they need to accept the
lack of an identifier as a request to describe all that you have in the server).
The above would be very welcomed, but we need to make sure it’s there
before un-plugging the listeners that keep GeoServer caps generation sane
Cheers
Andrea