[Geoserver-devel] GSIP 69 - Catalog scalability enhancements - OOM and fast startup

Doh, forgot one last bit that I’ve found interesting. Citing Gabriel:


If starting up geoserver in seconds instead of minutes, loading the
home page almost instantly instead waiting for seconds or even
minutes, under-second response times for the layers page list with
thousands of layers, including filtering and paging; not going OOM or
getting timeouts when doing a GetCapabilities request under
concurrency and/or low heap size, but instead streaming out as quickly
as possible, using as little memory as possible, and gracefully
degrading under load; are not ways of exercising the new API, then I’m
lost.


All good stuff that I did not see mentioned in the proposal, though
I can hardly imagine a GS going OOM under concurrent load of
GetCapabilities unless… well, maybe it has 200k layers and
works off just 256M of memory (gut feeling estimate).

In the past OOM in case of many layers is caused by leaks in the
DescribeFeatureType subsystem… did you measure how much memory
does it take for the keep the catalog in memory and do GetCapabilities?

Pardon the very rough way of assessing it, but if I we take as reference
the release directory we have 19 layers, with quite a bit of stores that
could be avoided (single shapefiles instead of directory stores), and
running inside “workspaces” the following gives me:

du -csh find . -name "*.xml"

256K total

which means, on average, 13KB of xml per layer (which still pads it quite
a bit since the service configuration is shared and normally you don’t
have so many stores). And then it’s XML, would you agree that the
in memory representation should be something like 5 times more compact?
This would give us a rough estimate of 3KB per layer.
If I have 200k layers it means 600MB of in memory storage.
Which is a lot, I’m not denying it, but if you are handling that many layers
you do also want to have some beefy hardware, 600MB should be peanuts.

I’m not trying to deny the scalability advantages of secondary storage, it just seems
to me the OOM reports may be a bit exaggerated.

The other thing that raises my interest and worries me is “starting up in seconds”.
My understanding of the current startup slowness is mostly due to the
validation checks we do on startup to see if a layer/feature type/store are working
and valid, that results in opening all stores, computing the feature types and so on.

Maybe the jdbc config is that much faster because it’s not loading everything up front
and thus those listeners are not being called?
If so we have a problem at hand, since the listeners are there to prevent the caps
documents to error miserably with a service exception at the first sign of trouble.

You may say that’s a design issue in the caps generator and I would agree, me
and Justin discussed it a bit during the FOSS4G-NA code sprint, we basically
can remove those checks if we can make the XML documents generation
“transactional” in some way, that is, put a mark on the ouput stream, generate
the xml for a layer, if it’s ok push it out, in case of exception throw away the
buffer and start back from the mark, and so on.
This would have to be made for all caps document, for rest-config parts that
do list resources, and for all Describe* calls (since they need to accept the
lack of an identifier as a request to describe all that you have in the server).

The above would be very welcomed, but we need to make sure it’s there
before un-plugging the listeners that keep GeoServer caps generation sane

Cheers
Andrea

Not to hijack the catalog discussion, but I will have some time later this week to work on implementing an output mode for the WMS GetCapabilities response to have it omit errant layers instead of generating invalid XML. If you have more detailed thoughts on “transactional” XML encoding I am interested to hear them. (The rough idea, buffer starting with an open tag until a closing tag is encoded, is pretty straightforward but I am not very familiar with the APIs and implementations involved.)


David Winslow
OpenGeo - http://opengeo.org/

On Sat, Apr 28, 2012 at 3:29 PM, Andrea Aime <andrea.aime@anonymised.com268…> wrote:

Doh, forgot one last bit that I’ve found interesting. Citing Gabriel:


If starting up geoserver in seconds instead of minutes, loading the
home page almost instantly instead waiting for seconds or even
minutes, under-second response times for the layers page list with
thousands of layers, including filtering and paging; not going OOM or
getting timeouts when doing a GetCapabilities request under
concurrency and/or low heap size, but instead streaming out as quickly
as possible, using as little memory as possible, and gracefully
degrading under load; are not ways of exercising the new API, then I’m
lost.


All good stuff that I did not see mentioned in the proposal, though
I can hardly imagine a GS going OOM under concurrent load of
GetCapabilities unless… well, maybe it has 200k layers and
works off just 256M of memory (gut feeling estimate).

In the past OOM in case of many layers is caused by leaks in the
DescribeFeatureType subsystem… did you measure how much memory
does it take for the keep the catalog in memory and do GetCapabilities?

Pardon the very rough way of assessing it, but if I we take as reference
the release directory we have 19 layers, with quite a bit of stores that
could be avoided (single shapefiles instead of directory stores), and
running inside “workspaces” the following gives me:

du -csh find . -name "*.xml"

256K total

which means, on average, 13KB of xml per layer (which still pads it quite
a bit since the service configuration is shared and normally you don’t
have so many stores). And then it’s XML, would you agree that the
in memory representation should be something like 5 times more compact?
This would give us a rough estimate of 3KB per layer.
If I have 200k layers it means 600MB of in memory storage.
Which is a lot, I’m not denying it, but if you are handling that many layers
you do also want to have some beefy hardware, 600MB should be peanuts.

I’m not trying to deny the scalability advantages of secondary storage, it just seems
to me the OOM reports may be a bit exaggerated.

The other thing that raises my interest and worries me is “starting up in seconds”.
My understanding of the current startup slowness is mostly due to the
validation checks we do on startup to see if a layer/feature type/store are working
and valid, that results in opening all stores, computing the feature types and so on.

Maybe the jdbc config is that much faster because it’s not loading everything up front
and thus those listeners are not being called?
If so we have a problem at hand, since the listeners are there to prevent the caps
documents to error miserably with a service exception at the first sign of trouble.

You may say that’s a design issue in the caps generator and I would agree, me
and Justin discussed it a bit during the FOSS4G-NA code sprint, we basically
can remove those checks if we can make the XML documents generation
“transactional” in some way, that is, put a mark on the ouput stream, generate
the xml for a layer, if it’s ok push it out, in case of exception throw away the
buffer and start back from the mark, and so on.
This would have to be made for all caps document, for rest-config parts that
do list resources, and for all Describe* calls (since they need to accept the
lack of an identifier as a request to describe all that you have in the server).

The above would be very welcomed, but we need to make sure it’s there
before un-plugging the listeners that keep GeoServer caps generation sane

Cheers

Andrea


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Mon, Apr 30, 2012 at 3:34 PM, David Winslow <dwinslow@anonymised.com> wrote:

Not to hijack the catalog discussion, but I will have some time later this week to work on implementing an output mode for the WMS GetCapabilities response to have it omit errant layers instead of generating invalid XML. If you have more detailed thoughts on “transactional” XML encoding I am interested to hear them. (The rough idea, buffer starting with an open tag until a closing tag is encoded, is pretty straightforward but I am not very familiar with the APIs and implementations involved.)

Let me see, I had in mind two ideas to make the usual TransformerBase subclass handle “transactional” writes,
one based on command recording, the other based on output stream buffering.

The first one basically is based on the observation that the code going through TransformerBase basically
writes XML via a handful of methods, start/end/chars/cdata, so one may put the code in “recording mode”
and just record the calls to those methods but not make them act on the ContentHandler,
see if we can encode stuff, and on “commit” replay those against
the actual contentHandler, otherwise throw away the list of recorded commands on “rollback”.

A lower level approach is based on the idea that regardless of what you do you go through a OutputStream,
so one could create a wrapper that can switch between stream and in memory buffer mode, put it
in buffer mode before starting to write a layer, write it, in case of success write the buffer contents to the
actual stream, in case of failure just throw away and move on.
This seems simpler and may have a smaller footprint, at the same time we have a few levels of abstraction
in between so we may not be 100% sure of what the layers above are doing, which might result in
switching to buffer mode and receiving some content that we though we should have already written
out after the switch (which would happen if anything above the output stream does some of its own buffering).

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Mon, Apr 30, 2012 at 8:16 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Mon, Apr 30, 2012 at 3:34 PM, David Winslow <dwinslow@anonymised.com> wrote:

Not to hijack the catalog discussion, but I will have some time later this week to work on implementing an output mode for the WMS GetCapabilities response to have it omit errant layers instead of generating invalid XML. If you have more detailed thoughts on “transactional” XML encoding I am interested to hear them. (The rough idea, buffer starting with an open tag until a closing tag is encoded, is pretty straightforward but I am not very familiar with the APIs and implementations involved.)

Let me see, I had in mind two ideas to make the usual TransformerBase subclass handle “transactional” writes,
one based on command recording, the other based on output stream buffering.

The first one basically is based on the observation that the code going through TransformerBase basically
writes XML via a handful of methods, start/end/chars/cdata, so one may put the code in “recording mode”
and just record the calls to those methods but not make them act on the ContentHandler,
see if we can encode stuff, and on “commit” replay those against
the actual contentHandler, otherwise throw away the list of recorded commands on “rollback”.

A lower level approach is based on the idea that regardless of what you do you go through a OutputStream,
so one could create a wrapper that can switch between stream and in memory buffer mode, put it
in buffer mode before starting to write a layer, write it, in case of success write the buffer contents to the
actual stream, in case of failure just throw away and move on.
This seems simpler and may have a smaller footprint, at the same time we have a few levels of abstraction
in between so we may not be 100% sure of what the layers above are doing, which might result in
switching to buffer mode and receiving some content that we though we should have already written
out after the switch (which would happen if anything above the output stream does some of its own buffering).

Agree with andrea on the tradeoffs between approaches. The stream approach would be simpler but if i remember correctly the underlying is abstracted away to the point where you don’t really have access to it. Could be wrong though, haven’t dealt with that api in quite some time.

From an api standpoint what I had in mind was this, consider the current (reduced) example of writing out a layer:

start(“Layer”);
element(“Title”, serviceInfo.getTitle());
element(“Abstract”, serviceInfo.getAbstract());

end("layer);

I thought an approach similar to java.io.InputStream.mark() could be used. So it would look like:

//create a “savepoint”
mark();
try {

start(“Layer”);
element(“Title”, serviceInfo.getTitle());
element(“Abstract”, serviceInfo.getAbstract());


end(“Layer”);

//all good, commit
commit();
}
catch(Exception t) {
//error, rollback to savepoint
reset();
}

$0.02

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Ok, I’m getting started on an implementation based on recording XML events:

http://jira.codehaus.org/browse/GEOS-5084
http://jira.codehaus.org/browse/GEOT-4125

Preliminary tests in GeoServer seem promising, but I’ll hold off on providing a GS-level patch until it’s a bit more polished (configuration option, etc.)


David Winslow
OpenGeo - http://opengeo.org/

On Mon, Apr 30, 2012 at 11:46 AM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

On Mon, Apr 30, 2012 at 8:16 AM, Andrea Aime <andrea.aime@anonymised.com.1268…> wrote:

On Mon, Apr 30, 2012 at 3:34 PM, David Winslow <dwinslow@anonymised.com> wrote:

Not to hijack the catalog discussion, but I will have some time later this week to work on implementing an output mode for the WMS GetCapabilities response to have it omit errant layers instead of generating invalid XML. If you have more detailed thoughts on “transactional” XML encoding I am interested to hear them. (The rough idea, buffer starting with an open tag until a closing tag is encoded, is pretty straightforward but I am not very familiar with the APIs and implementations involved.)

Let me see, I had in mind two ideas to make the usual TransformerBase subclass handle “transactional” writes,
one based on command recording, the other based on output stream buffering.

The first one basically is based on the observation that the code going through TransformerBase basically
writes XML via a handful of methods, start/end/chars/cdata, so one may put the code in “recording mode”
and just record the calls to those methods but not make them act on the ContentHandler,
see if we can encode stuff, and on “commit” replay those against
the actual contentHandler, otherwise throw away the list of recorded commands on “rollback”.

A lower level approach is based on the idea that regardless of what you do you go through a OutputStream,
so one could create a wrapper that can switch between stream and in memory buffer mode, put it
in buffer mode before starting to write a layer, write it, in case of success write the buffer contents to the
actual stream, in case of failure just throw away and move on.
This seems simpler and may have a smaller footprint, at the same time we have a few levels of abstraction
in between so we may not be 100% sure of what the layers above are doing, which might result in
switching to buffer mode and receiving some content that we though we should have already written
out after the switch (which would happen if anything above the output stream does some of its own buffering).

Agree with andrea on the tradeoffs between approaches. The stream approach would be simpler but if i remember correctly the underlying is abstracted away to the point where you don’t really have access to it. Could be wrong though, haven’t dealt with that api in quite some time.

From an api standpoint what I had in mind was this, consider the current (reduced) example of writing out a layer:

start(“Layer”);
element(“Title”, serviceInfo.getTitle());
element(“Abstract”, serviceInfo.getAbstract());

end("layer);

I thought an approach similar to java.io.InputStream.mark() could be used. So it would look like:

//create a “savepoint”
mark();
try {

start(“Layer”);
element(“Title”, serviceInfo.getTitle());
element(“Abstract”, serviceInfo.getAbstract());


end(“Layer”);

//all good, commit
commit();
}
catch(Exception t) {
//error, rollback to savepoint
reset();
}

$0.02

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf



Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Tue, May 1, 2012 at 8:52 PM, David Winslow <dwinslow@anonymised.com> wrote:

Ok, I’m getting started on an implementation based on recording XML events:

http://jira.codehaus.org/browse/GEOS-5084
http://jira.codehaus.org/browse/GEOT-4125

Preliminary tests in GeoServer seem promising, but I’ll hold off on providing a GS-level patch until it’s a bit more polished (configuration option, etc.)

and tests? :-p

Just glanced over it, looks good so far.

Just another bit about the GeoServer part: it might be useful to add a comment in place of
each layer that could not be dumped with a very short summary of the issue (e.g., the
error message and the current time so that the exception can be looked up in the logs)

Hmm… and I guess the comment method should be recorded too, just in case someone uses
it as part of the normal XML generation

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


For the record, I have split this conversation further into two
separate threads: "GSIP 69 - Catalog scalability enhancements - OOM"
and "GSIP 69 - Catalog scalability enhancements - fast startup"

On Sat, Apr 28, 2012 at 4:29 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

Doh, forgot one last bit that I've found interesting. Citing Gabriel:

----------------------------------------------------------------------------

If starting up geoserver in seconds instead of minutes, loading the
home page almost instantly instead waiting for seconds or even
minutes, under-second response times for the layers page list with
thousands of layers, including filtering and paging; not going OOM or
getting timeouts when doing a GetCapabilities request under
concurrency and/or low heap size, but instead streaming out as quickly
as possible, using as little memory as possible, and gracefully
degrading under load; are not ways of exercising the new API, then I'm
lost.

----------------------------------------------------------------------------

All good stuff that I did not see mentioned in the proposal, though
I can hardly imagine a GS going OOM under concurrent load of
GetCapabilities unless... well, maybe it has 200k layers and
works off just 256M of memory (gut feeling estimate).

In the past OOM in case of many layers is caused by leaks in the
DescribeFeatureType subsystem... did you measure how much memory
does it take for the keep the catalog in memory and do GetCapabilities?

Pardon the very rough way of assessing it, but if I we take as reference
the release directory we have 19 layers, with quite a bit of stores that
could be avoided (single shapefiles instead of directory stores), and
running inside "workspaces" the following gives me:

du -csh `find . -name "*.xml"`
256K total

which means, on average, 13KB of xml per layer (which still pads it quite
a bit since the service configuration is shared and normally you don't
have so many stores). And then it's XML, would you agree that the
in memory representation should be something like 5 times more compact?
This would give us a rough estimate of 3KB per layer.
If I have 200k layers it means 600MB of in memory storage.
Which is a lot, I'm not denying it, but if you are handling that many layers
you do also want to have some beefy hardware, 600MB should be peanuts.

I'm not trying to deny the scalability advantages of secondary storage, it
just seems
to me the OOM reports may be a bit exaggerated.

The other thing that raises my interest and worries me is "starting up in
seconds".
My understanding of the current startup slowness is mostly due to the
validation checks we do on startup to see if a layer/feature type/store are
working
and valid, that results in opening all stores, computing the feature types
and so on.

Maybe the jdbc config is that much faster because it's not loading
everything up front
and thus those listeners are not being called?
If so we have a problem at hand, since the listeners are there to prevent
the caps
documents to error miserably with a service exception at the first sign of
trouble.

You may say that's a design issue in the caps generator and I would agree,
me
and Justin discussed it a bit during the FOSS4G-NA code sprint, we basically
can remove those checks if we can make the XML documents generation
"transactional" in some way, that is, put a mark on the ouput stream,
generate
the xml for a layer, if it's ok push it out, in case of exception throw away
the
buffer and start back from the mark, and so on.
This would have to be made for all caps document, for rest-config parts that
do list resources, and for all Describe* calls (since they need to accept
the
lack of an identifier as a request to describe all that you have in the
server).

The above would be very welcomed, but we need to make sure it's there
before un-plugging the listeners that keep GeoServer caps generation sane

Cheers
Andrea

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.

On Tue, May 1, 2012 at 9:56 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

For the record, I have split this conversation further into two
separate threads: “GSIP 69 - Catalog scalability enhancements - OOM”
and “GSIP 69 - Catalog scalability enhancements - fast startup”

Yep, seen that, more worried about the lack of reactions on your side on
the filtering topic.

Cheers
Andrea

Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


On Tue, May 1, 2012 at 5:18 PM, Andrea Aime
<andrea.aime@anonymised.com> wrote:

On Tue, May 1, 2012 at 9:56 PM, Gabriel Roldan <groldan@anonymised.com> wrote:

For the record, I have split this conversation further into two
separate threads: "GSIP 69 - Catalog scalability enhancements - OOM"
and "GSIP 69 - Catalog scalability enhancements - fast startup"

Yep, seen that, more worried about the lack of reactions on your side on
the filtering topic.

Not lack of reaction, taking the time to justify my position and reply
accordingly, besides having to attend other obligations. But be sure
I'll follow up.

Cheers,
Gabriel

Cheers
Andrea

--
-------------------------------------------------------
Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-------------------------------------------------------

--
Gabriel Roldan
OpenGeo - http://opengeo.org
Expert service straight from the developers.