We recently upgraded from Geoserver 2.18.1 running on AWS EC2 to the Geoserver 2.28 Docker container running in ECS.
Our application requests map tiles as png from geoserver which then uses WFS to fetch the underlying data. We have a tile layer cache using a file blobstore which seems to be working fine. Our application is high scale with 10s of millions of map points and serving 2000-5000 requests per second.
The only EXTRA_JAVA_OPTS settings that we are passing are “-Xms8g -Xmx16g”
The issue is that geoserver’s java heap continually increases until the container crashes (maxing out at the 16g max heap) and is replaced. Even clicking “Free Memory” does nothing to lower the memory usage by geoserver.
indicating that there is a HashMap holding on to some parsed XML QName objects which are never cleared. Note that Geoserver 2.18.1 had no issues with the load or memory and in-fact work on a max heap of 12GB without issue.
The only thing being logged is: “WARN [xsd.impl] - cvc-elt.1.a: Cannot find the declaration of element ‘Filter’.” which we are not sure is related.
Looking for input on how to further debug and/or resolve this issue. Thanks!
I don’t know the answer to your problem, nor have I come across others with similar symptoms. GeoServer is open source, so here’s how I would go about tracking down your problem:
Has the bug previously been logged in Jira, the issue tracker? If not, please log it here: Jira
There are obviously a lot of changes in the last 5 years, from 2.18.1 to 2.28! Do a binary search (focusing on major versions) to narrow down when the problem started: 2.23 is half way between 2.18 and 2.28, did the problem exist then? Repeat, until you can identify the exact version that introduced the problem. Report that here and in Jira.
Then, look at the change log for that version. It might be possible to immediately identify a suspicious change that introduced the bug.
Ultimately, someone will likely need to create a software fix for this. Does your organisation have a Java developer who can contribute a patch? Alternatively, you can look at Support - GeoServer to find a suitable provider who you can engage.
Best of luck & thank you for highlighting the problem. Perhaps someone else here has already experienced the same and can share their solution.
I was able to take a full heap dump and analyze it with Eclipse MAT. The heap dump shows:
org.geotools.filter.v2_0.FES has an attribute named schema of type XSDResourceImpl
XSDResourceImpl.eAdapters() contains many instances of:
org.geotools.xsd.impl.SchemaIndexImpl$SchemaAdapter each retaining roughly 800 QName objects
Those adapters are seemingly created by ApplicationSchemaXSD.buildSchema() which is seemingly called everytime a request to an upstream WFS data source is made.
While I have knowledge of Java and heap analysis, I know very little about geoserver itself. I will try opening a bug.
I don’t seem to be able to create issues in JIRA other than anonymous reports.
However I did find bug Jira that almost exactly matches (same symptoms, same schema/adapter classes, just retained by org.geotools.filter.v2_0.FES rather than PullParser)
The class you’re referring to is within GeoTools. The code that triggers this behavior almost certainly is in this part:
/** Adapter for tracking changes to schemas. */
SchemaAdapter adapter;
public SchemaIndexImpl(XSDSchema[] schemas) {
this.schemas = new XSDSchema[schemas.length + 1];
adapter = new SchemaAdapter();
// set the schemas passed in
for (int i = 0; i < schemas.length; i++) {
this.schemas[i] = schemas[i];
synchronized (this.schemas[i].eAdapters()) {
this.schemas[i].eAdapters().add(adapter);
}
}
// add the schema for xml schema itself
this.schemas[schemas.length] = schemas[0].getSchemaForSchema();
}
I’m not sure if I have the capacity to look at this for the moment, but someone should.
Please have a look at the link Peter posted about support providers.
Even with a Atlassian account, the Create button does nothing, I don’t think that I have permission to do anything other than search. In the meantime, I’ll look into reaching out to one of the support providers.
Thank you for that feedback. I was not aware there was a problem with Jira, which obviously should work for new accounts, so that we can properly track issues. We will have to look into that.
Until then, please continue to report your findings here (particularly if you can do a binary search to narrow down the version.) I have just done some work on generating (old) Docker images, which you can use to quickly obtain those you need for the search.
I also observed similar memory leaks with the docker images of the 2.28.1, with a store accessing images on an object storage.
We have changed our memory settings (use the -XX:MaxRAMPercentage=80 parameter instead of -Xms -Xmx parameters) and adjust our Kubernetes requests/limits and we no longer have problems.
Perhaps we can make another topic to figure out ticket creation?
In the past we have run into a 2000 person limit on the number of users. Checking in today we have: 2667 total users, 1821 active users. The categories are active, suspended and deactivated.
I cannot figure out what the current limit of number of users is.
This discussion does indeed sound very similar to the findings I have made a few years ago and that were reported here: https://osgeo-org.atlassian.net/browse/GEOT-6517 . As mentioned earlier, the problem back then looked exactly as it does right now, and looking back at my comments at the ticket, I wasn’t sure back then, if the fix that was made maybe only fixed one part of the problem.