Justin,
I am not sure if there is a jira issues that covers this topic or not. As for how I did my testing, I used a java program to perform an http post request to my GeoServer implementation. I cannot get into too much detail about data I am actually using, however I went back and made a generic example that produces the same problem that you can run yourself. Basically I produce load on the server by sending http post requests with wfst insert statements back to back, once one request gets the 200 acknowledge signal from the server the next is sent, and so on. After some time I stop the sending of data and then examined the heap. I then usually run my test again and observe the differences between the heaps after different runs. Following I have listed information about what versions of what software I am using.
GeoServer 1.6.4
Tomcat 6.0.18
Java 1.6.0_06
PostgreSQL 8.3.3
Postgis 1.3.3
In my Tomcat instance I also used a setenv.sh file with the following options:
JAVA_OPTS="-server -Xms48m -Xmx768M -XX:SoftRefLRUPolicyMSPerMB=36000 -XX:MaxPermSize=128m"
CATALINA_OPTS="-XX:MaxPermSize=512m"
For this test I had the geoserver.war and a ROOT directory in my Tomcat implementation. In the ROOT directory I added the directories nb/1.0 where I added the following two schemas that I created:
nb.xsd
---------------------------------------------------------------------
<xsd:schema targetNamespace="http://localhost:8080/nb/1"
xmlns:wx="http://localhost:8080/nb/1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified" version="1.0">
<xsd:include schemaLocation="nbExample.xsd" />
</xsd:schema>
nbExample.xsd
---------------------------------------------------------------------
<xsd:schema targetNamespace="http://localhost:8080/nb/1" xmlns:nb="http://localhost:8080/nb/1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:gml="http://www.opengis.net/gml"
elementFormDefault="qualified" attributeFormDefault="unqualified" version="1.0">
<xsd:annotation>
<xsd:documentation> A test schema to try and produce a memory leak in GeoServer
</xsd:documentation>
</xsd:annotation>
<xsd:import namespace="http://www.opengis.net/gml"
schemaLocation="../../gml/3.1.1/base/gml.xsd"/>
<xsd:complexType name="ExampleType">
<xsd:complexContent>
<xsd:extension base="gml:AbstractFeatureType">
<xsd:sequence>
<xsd:element name="time" type="gml:TimePositionType"/>
<xsd:element name="position" type="nb:PositionType"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:element name="example" type="nb:ExampleType" substitutionGroup="gml:_Feature" />
<xsd:complexType name="PositionType">
<xsd:sequence>
<xsd:element ref="gml:Point"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
---------------------------------------------------------------------
Also I took a copy of the gml and xlink directories from the geoserver/schemas directory and also placed them in the ROOT directory. I did this just so when I was working with my schema it would validate properly.
In my Postgis database I created a table with the following:
create table example (id int PRIMARY KEY, time timestamp with time zone);
select AddGeometryColumn('example', 'position', 4326, 'POINT', 2);
I then added my namespace nb to GeoServer with an uri of:
http://localhost:8080/nb/1
I added my database as a DataStore and then made a FeatureType called example. Following is the info.xml file for this FeatureType:
---------------------------------------------------------------------
<featureType datastore = "example" >
<name>example</name>
<!--
native wich EPGS code for the FeatureTypeInfoDTO
-->
<SRS>4326</SRS>
<SRSHandling>0</SRSHandling>
<title>example_Type</title>
<abstract>Generated from example</abstract>
<wmspath>/</wmspath>
<numDecimals value = "8" />
<keywords>example</keywords>
<latLonBoundingBox dynamic = "false" maxx = "-75.0" maxy = "100.0" minx = "-100.0" miny = "1.0" />
<nativeBBox dynamic = "false" maxx = "-75.0" maxy = "100.0" minx = "-100.0" miny = "1.0" />
<!--
the default style this FeatureTypeInfoDTO can be represented by.
at least must contain the "default" attribute
-->
<styles default = "point" />
<cacheinfo enabled = "false" maxage = "" />
<maxFeatures>0</maxFeatures>
</featureType>
---------------------------------------------------------------------
I then used a Java program to perform my http posts of my wfst insert statements. As mentioned before my code just waits until the previous request receives the 200 acknowledge message and then sends the next request. Following I have an xml document that contains the body of each http post:
---------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<wfs:Transaction
xsi:schemaLocation="http://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.1.0/wfs.xsd http://localhost:8080/nb/1.0/nb.xsd"
xmlns:wfs="http://www.opengis.net/wfs" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:nb="http://localhost:8080/nb/1" xmlns:gml="http://www.opengis.net/gml">
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
<wfs:Insert>
<nb:example>
<nb:time>2000-01-01T00:00:00</nb:time>
<nb:position>
<gml:Point>
<gml:pos>1 1</gml:pos>
</gml:Point>
</nb:position>
</nb:example>
</wfs:Insert>
</wfs:Transaction>
---------------------------------------------------------------------
Once everything was done and ready I just let my Java program send the above data about 400 to 500 times. This only took me about a couple of minutes. After this I halted the sending of new data and examined the heap. I then usually ran my test again and observed the changes in the heap from the two trials.
When I run this test I get the problems that I have been describing. As far as I can tell the more separate http post requests I make the more instances of the class org.geotools.xml.impl.SchemaIndexImpl, and others I get. I believe if you run the same tests you will see the same thing but I cannot be sure about that. If you need to know anything else to help you run your own tests or, if for some reason you cannot reproduce the memory leak then let me know. Thanks for all your help so far.
Brett
-----Original Message-----
From: Justin Deoliveira [mailto:jdeolive@anonymised.com]
Sent: Thursday, September 11, 2008 5:44 PM
To: Levasseur, Brett
Cc: 'Gabriel Roldán'; geoserver-users@lists.sourceforge.net; David R Robison
Subject: Re: FW: [Geoserver-users] Memory Leak
Hi Brett,
Thanks for the info... I think this makes sense. I believe the adapter
the index attaches to the schema needs to be removed on index disposal.
That or the index should be reused and not continually recreated.
Is there a jira issue open for this? If not we should create one.
Also how are you running a load on GeoServer during profiling? Are you
using an automated tool? Some script you wrote? etc... It might save
some time if i could reuse your setup.
-Justin
Levasseur, Brett wrote:
Gabriel & Justin,
I just found a feature in Yourkit called 'GC Roots', which is supposed to give better information for finding memory leaks by showing why an object is retained in memory. I used this on the class org.geotools.xml.implSchemaIndexImpl and got the following, I hope this helps a little more.
org.geotools.xml.impl.SchemaIndexImpl
org.geotools.xml.impl.SchemaIndexImpl$SchemaAdapter
org.eclipse.emf.common.notify.impl.BasicNotifierImpl$EAdapterList
org.eclipse.xsd.impl.XSDSchemaImpl
org.geoserver.wfs.xml.v1_1_0.WFSSchemaLocator
org.geoserver.wfs.xml.v1_1_0.WFSConfiguration
I had found most of these classes before when I sent my last email, but using the 'GC Roots' feature also highlighted the last two classes in the list above that I had not noticed before.
Brett
-----Original Message-----
From: Levasseur, Brett
Sent: Wednesday, September 10, 2008 11:51 AM
To: 'Gabriel Roldán'; Justin Deoliveira
Cc: David R Robison; geoserver-users@lists.sourceforge.net
Subject: RE: [Geoserver-users] Memory Leak
Gabriel & Justin,
I have used Yourkit to collect some more information. I made two snapshots of the heap while running my tests. In Yourkit I used the Inspections feature and ran the tests under 'Possible leaks'. The 'Lost SWT Controls' test returned nothing. The 'Objects Retained by Inner Class Back References' test did return a few classes. Between the first and the second snapshot the classes that the test returned and the number of instances were the same for all but one. In my first snapshot the class org.geotools.xml.impl.SchemaIndexImpl appeared 470 times taking up 15% of the heaps retained size. In my second snapshot this class appeared 940 times taking up 25% of the heaps retained size. All of the instances of this class were made up of an array of org.eclipse.xsd.XSDSchemaImpl objects and three HashMap objects.
Under the Memory feature of Yourkit I found that the HashMap class was taking up the majority of the memory in both snapshots. I then used the 'Merged paths' feature of Yourkit to determine what objects were retaining the memory held by HashMap. I found that the one class that held more of this memory than any other was again org.geotools.xml.impl.SchemaIndexImpl. The memory for this class was being held by org.geotools.xml.impl.SchemaIndex$SchemaAdapter whose memory was held by org.eclipse.emf.notify.Adapter whose memory was held by org.eclipse.emf.common.notify.impl.BasicNotifierImpl$EAdapterList whose memory was held by org.eclipse.xsd.impl.XSDSchemaImpl whose memory was held by instances of multiple other classes.
From this information I now believe that while the HashMap is taking up the majority of memory it is most likely because of the class org.geotools.xml.impl.SchemaIndexImpl or one of the classes that retain a hold on that class that I listed above.
The WFS insert transaction statements and getFeature requests that I send the server use the schemas for wfs, ogc, gml, and the regular xml. Also I created another schema to represent my own data. Is it possible that this memory leak is occurring because of my schema document? Currently my instance of GeoServer does not have access to my schema document.
Brett
-----Original Message-----
From: Gabriel Roldán [mailto:groldan@anonymised.com]
Sent: Tuesday, September 09, 2008 4:05 PM
To: Levasseur, Brett; Justin Deoliveira
Cc: David R Robison; geoserver-users@lists.sourceforge.net
Subject: Re: [Geoserver-users] Memory Leak
On Tuesday 09 September 2008 04:37:03 pm Levasseur, Brett wrote:
Gabriel,
I will check out using Yourkit like you suggested. Until then I used jhat
to go through a dump of the heap from one of my trials. I checked the first
150 instances of the class java.util.HashMap$Entry and found that among
them 147 had the same setup with a key being an instance of the class
javax.xml.namespace.QName and a value being an instance of the class
org.eclipse.xsd.impl.XSDElementDecleration. These instances of the HashMap
class are referenced by instances of the class
org.geotools.xml.impl.SchemaIndexImpl.
I will know more once I have had the chance to collect more data, however
if the appearance of the classes I have mentioned above proves to be the
pattern would this fall under the category of the xml parser?
yes, or rather under the category of the whole xml subsystem, so called gt-xsd
extensions. These are xml schema aware parsers that use java binding classes
and the eclipse emf framework. I wonder if we are certain that the leak is
due to stale HashMap instances, and I would suspect of the xsd caches, but to
have more than guessing you may want to contact Justin, cc'ed here, whose the
expert on this.
keep us posted.
Gabriel
Brett Levasseur
-----Original Message-----
From: Gabriel Roldán [mailto:groldan@anonymised.com]
Sent: Tuesday, September 09, 2008 10:47 AM
To: David R Robison
Cc: geoserver-users@lists.sourceforge.net; Levasseur, Brett
Subject: Re: [Geoserver-users] Memory Leak
hi david,
no i can't know without a profiler, that's what i was saying before.
gabriel
On Tuesday 09 September 2008 09:37:41 am David R Robison wrote:
Can you explain a bit more about what with the XML Parser causes it to
leak memory? I am also seeing some memory leaks and it appears that it
is related to the XML Parser. Any hints on how to track this down, or
where to look for typical problems in the code would be greatly
appreciated. TNX David
Gabriel Roldán wrote:
Hi Brett,
there's a usual suspect, xml paser. But I may be wrong and the leak
laying out in another place. That inspection of yours does not tell who
creates the hashmaps, does it?
I'm tempted to hit geoserver like that but running under a profiler for
a while but that'd be constrained by higher priorities. Wonder if you
could manage to make the same longevity testing with Yourkit java
profiler (there's a trial version and we have geotools specific
licences) so we can analyze a profiler's snapshot?
best regards,
Gabriel
On Friday 05 September 2008 05:06:40 pm Levasseur, Brett wrote:
Hello,
I am trying to run a Web Feature Service with GeoServer but I am
experiencing a memory leak. I have a Java program that sends new
features to GeoServer in an http post that then get added to a PostGIS
database. The total amount of data that gets sent to the server is
very small, usually less than a couple of dozen of features per http
post. Also each feature is only made up of five fields (a position, a
time stamp, and a few numbers) so the amount of data that I am
expecting GeoServer to handle is very small. After about a day of
running the program on tomcat I found that the server was using up
much more memory than it had the day before. I looked at the heap and
found that the class java.util.HashMap had more instances then any
other class and was taking up more memory then any other class. Does
anyone have any idea what could be causing this memory leak? I am
running GeoServer version 1.6.4 and tomcat version 6.0.18.
Brett
-----------------------------------------------------------------------
-- This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge Build the coolest Linux based applications with Moblin SDK &
win great prizes Grand prize is a trip for two to an Open Source event
anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users
--
Justin Deoliveira
Software Engineer, OpenGeo
http://opengeo.org