Alarming off-heap memory foot-print >=2.24.0

Hi,

we are on the verge of migrating to 2.27.1 (from 2.23.6). We are aware that that’s a big task to do.

Since weeks we are stuck on a problem with GeoServer’s memory foot-print that we cannot really find the cause for and also only have a workaround solution. I am trying to explain the tests and observations here, seeing if someone else has made similar observations, or might be able to help find the cause - an maybe even a proper solution.

Let me try to explain:

Load tests

As part of our standard procedure, we have been running load tests on 2.27.1. The way we do this is to take the “minimal” data set from GeoServer, then define a few data stores and layers - for this test we used GRIB2 numerical weather prediction data (gs-grib plugin).

Then we ran WMS requests on 4 layers with 16 tiles of 256x256 each (EPSG:4326 bbox). We run the requests without any dimension settings and default raster style.

The way we run the tests is that for the four layers the requests are executed in parallel without any delays.

That way we are basically bombarding GeoServer with many requests (~15.000 requests/min) . We are feeding the system metrics and tomcat metrics into elastic and are monitoring system load/cpu/memory and number of requests tomcat/GeoServer is able to handle.

Note, all of the upcoming depictions have been performed on the same server (~90GB RAM), the same tomcat and by using JDK 21.0.7 - especially also the tests of GeoServer 2.23.6 which does not show the described problem.

2.27.1: Problem with off-heap memory consumption and resulting server crashes

Shortly after our first tests we noticed that our GeoServers (now on 2.27.1) keep crashing with OOMs.

Our setup (copied from 2.23.6) is basically

  • initial heap: 10BG
  • max heap: 40BG
  • garbage collector: G1GC
  • ~90GB RAM

With the above mentioned loads tests, we have been observing the following memory consumption foot-print

We can see that the memory consumption increased dramatically over a very short period of time. At the times where the memory suddenly dropped was when our server crashed and restarted itself.

Looking further, we analyzed the heap:

Interestingly, the heap did not increase at all. Taking heap dumps on the VM showed the same - nothing unusual.

Our conclusion is that somehow that vast memory consumption must be off-heap. Unfortunately, native memory tracking didn’t really reveal any problems:

Native Memory Tracking:
 
(Omitting categories weighting less than 1KB)
 
Total: reserved=44452560KB +66584KB, committed=5753500KB +143104KB
 
-                 Java Heap (reserved=41943040KB, committed=5242880KB)
                            (mmap: reserved=41943040KB, committed=5242880KB)
  
-                     Class (reserved=1051336KB +1035KB, committed=17928KB +4491KB)
                            (classes #22932 +4656)
                            (  instance classes #21591 +4360, array classes #1341 +296)
                            (malloc=2760KB +1035KB #88290 +31693)
                            (mmap: reserved=1048576KB, committed=15168KB +3456KB)
                           : (  Metadata)
                            (    reserved=131072KB, committed=117504KB +31808KB)
                            (    used=116896KB +31646KB)
                            (    waste=608KB =0,52% +162KB)
                           : (  Class space)
                            (    reserved=1048576KB, committed=15168KB +3456KB)
                            (    used=14539KB +3254KB)
                            (    waste=629KB =4,14% +202KB)
  
-                    Thread (reserved=85255KB +14243KB, committed=7515KB +1899KB)
                            (thread #83 +13)
                            (stack: reserved=84996KB +14200KB, committed=7256KB +1856KB)
                            (malloc=163KB +28KB #505 +78)
                            (arena=95KB +15d #164 +26)
  
-                      Code (reserved=257317KB +5674KB, committed=103421KB +59274KB)
                            (malloc=9633KB +5674KB #25090 +11830)
                            (mmap: reserved=247684KB, committed=93788KB +53600KB)
  
-                        GC (reserved=885678KB +3668KB, committed=168878KB +3668KB)
                            (malloc=33218KB +3668KB #23427 +14256)
                            (mmap: reserved=852460KB, committed=135660KB)
  
-                 GCCardSet (reserved=72KB +21KB, committed=72KB +21KB)
                            (malloc=72KB +21KB #609 +114)
  
-                  Compiler (reserved=632KB +273KB, committed=632KB +273KB)
                            (malloc=468KB +273KB #1942 +754)
                            (arena=164KB #4)
  
-                  Internal (reserved=35807KB +35260KB, committed=35807KB +35260KB)
                            (malloc=35771KB +35260KB #82757 +68971)
                            (mmap: reserved=36KB, committed=36KB)
  
-                     Other (reserved=356KB +208KB, committed=356KB +208KB)
                            (malloc=356KB +208KB #44 +20)
  
-                    Symbol (reserved=34622KB +7181KB, committed=34622KB +7181KB)
                            (malloc=29499KB +5998KB #345299 +66485)
                            (arena=5122KB +1183d #1)
  
-    Native Memory Tracking (reserved=9275KB +3172KB, committed=9275KB +3172KB)
                            (malloc=203KB +81KB #3649 +1466)
                            (tracking overhead=9073KB +3091KB)
  
-        Shared class space (reserved=16384KB, committed=12896KB)
                            (mmap: reserved=16384KB, committed=12896KB)
  
-               Arena Chunk (reserved=101KB -4811KB, committed=101KB -4811KB)
                            (malloc=101KB -4811KB)
  
-                   Tracing (reserved=32KB, committed=32KB)
                            (arena=32KB #1)
  
-                    Module (reserved=627KB +264KB, committed=627KB +264KB)
                            (malloc=627KB +264KB #5697 +932)
  
-                 Safepoint (reserved=8KB, committed=8KB)
                            (mmap: reserved=8KB, committed=8KB)
  
-           Synchronization (reserved=207KB +82KB, committed=207KB +82KB)
                            (malloc=207KB +82KB #2012 +814)
  
-            Serviceability (reserved=24KB +6KB, committed=24KB +6KB)
                            (malloc=24KB +6KB #159 +128)
  
-                 Metaspace (reserved=131757KB +324KB, committed=118189KB +32132KB)
                            (malloc=685KB +324KB #662 +445)
                            (mmap: reserved=131072KB, committed=117504KB +31808KB)
  
-      String Deduplication (reserved=1KB, committed=1KB)
                            (malloc=1KB #8)
  
-           Object Monitors (reserved=31KB -14KB, committed=31KB -14KB)
                            (malloc=31KB -14KB #151 -71)

The numbers aren’t rising as much as we have anticipated, only +143MB. Our only current explanation is that native memory might be allocated, no longer used, but somehow not freed.

Workaround solution: The role of ZGC and tile-recycling

One thing that we observed is that more aggressive garbage collection seems to limit the increase of the off-heap memory consumption to the defined max-heap settings.

  1. If we switch to ZGC using the following settings
-XX:+UseZGC -XX:+ZGenerational -XX:SoftMaxHeapSize=20G -Xmx30G

then the memory consumption stabilizes somewhere around the max-heap setting:

  1. If we stick with G1GC then the enabling of tile-recycling seems to have a similar workaround effect:

memory consumption increases fast, but the heap increases with it (most likely tile storage). Then at heap == max-heap, G1GC will perform a full GC run, and that seems to limit also the off-heap consumption.

One ting that can probably be noted is that by the metrics we captured, we are currently convinced that the problem does not seem to be a memory leak. Rather than that, it seems that something changed in GeoTools / GeoServer that dramatically changed the way native memory is consumed. Full GC runs (or at least a much more aggressive GC strategy) now seems to be required, in order to stabilize GeoServer’s memory consumption.

Async Profiling results

We also ran the async profiler tool: GitHub - async-profiler/async-profiler: Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events with native memory tracking:

./asprof start -e nativemem -f app.jfr <pid>
... wait until just before crash
./asprof stop -e nativemem -f app.jfr <pid>

The tracking showed some spots that seem to be consuming a lot native memory:

Leak detection points us to similar spots:

Unfortunately, we could not see any particular changes in GeoTools/GeoServer that would be affecting these spots. Also, the same analysis based on 2.23.6 showed very similar results.

JEMalloc

Finally we investigated into GitHub - jemalloc/jemalloc as it seems commmonly understood that Linux glibc malloc has a problem where it allocates native memory regions of an unfavourable size of 64MB, which could lead to native memory fragmentation.

This results in a memory curve as this:

  • GS 2.27.1 + G1GC + Jemalloc + 30G maxHeap

Again, it looks like there’s not a particular memory leak, but more a weird change in the way memory is allocated and freed.

Tracing back to 2.24.0

We traced the problem back to 2.24.0, by going back from 2.27.x branches of GeoServer downwards (2.26.x, 2.25.x, 2.24.x). Finally the earliest version, that must have introduced the behaviour was 2.24.0, because going upward from 2.23.6 to 2.23.x did not show that behaviour at all.

Memory foot-print in 2.23.6

Finally, here comes a figure of how the consumption was in 2.23.6:

  • initial heap: 10GB
  • max-heap: 40MG
  • garbage collection: G1GC
  • tile-recycling: off

Summary

It seems that something between 2.23.x and 2.24.0 has changed with a dramatic impact on how off-heap memory is being used in GeoServer. Current analysis has made us conclude that it might not be a leak, but somehow a different behaviour that requires a more aggressive garbage collection strategy.

The only way we can currently survive is that we a using jemalloc + ZGC + a much decreased max-heap setting.

My question to the community + devs would be if anyone has made similar observations, or has any idea what might have caused the drastic change in behaviour.

Currently our goal is to try understand what’s going on there.

We are thankful for any suggestions / ideas.
Thanks!

That is a very complete analysis, you are the first person to bring up the topic here on the discussion forum, but we do occasionally find that performance issues take some years to be brought to our attention “because the devs must of noticed”. I caution that we get to develop the software and not always have the opportunity to run it in fund environments - it is one of the reasons we depend on community testing and feedback such as yours for project stability.

Can I ask what kind of datasets you are working with? Shapefile / GeoPackage / Imagery? If you could check the difference between 2.23.x and 2.24.0 we may be able to figure out of this problem is with either:
a) geotools datastore + streaming renderer processing chain
b) image io + jai + jai-ext + geotools grid coverage renderer processing chain

You did describe WMS output which is why I have focused on the two ways to produce imagery.

The changes between 2.23.x and 2.24.0 are here:

The massive change was from org.opengis packages to org.geotools.api, while this intended to affects imports it makes direct comparison a little more annoying then needful due to the amount file files modified.

aside: If I am being suspicious there was some changing to bounding box handling code and reproduction, perhaps that placed additional memory stress on the 30.x branch?

I can confirm we haven’t noticed a similar issue, in several production setups dealing with large amounts of imagery.

The tile recyling bit helping is at least odd… tiles are cached in the heap, there is no off-heap storage.

One thing that surprises me is the very large heap you’re working with… we normally run GeoServer with 4 to 8GB of heap, if more processing power is needed we normally another small processing node, while it seems you went for a large machine instead (generally speaking, not recommended for scalability). GeoServer code is written with relatively small hardware in mind (4-8 cores, 8-16GB of memory) rather than a very large machine. Optimizing for the latter would likely require some code tweaks and possibly reworking some architectural decisions.

Cheers
Andrea

Hi,

we used the grib plugin src/extension/grib. It seemed easier because we are using that data a lot and it was therefore the obvious choice.

Now that I am thinking, maybe it was not such a good idea after all. I probably should have used a dataset that is already available in the GeoServer’s provided data folders, like “release”. I might re-run the test with one the those datasets. Based on the previous findings, 2.24.0 should be enough for a test.

Thanks for providing the links to the comparisons. As you had mentioned, the changes were massive and we have spent hours already checking them, unfortunately we weren’t able to isolate a module that could cause this.

But … maybe let’s wait for the results of me repeating the tests with some of the “demo” layers of the “release” data bundle.

… and I will check jai-ext and image-io changes.

Note though, that the one change in jai-ext/Scale we had back-ported to our 2.23.6 version and it worked fine there, so If it was that change you referred to, then that shouldn’t be the problem

Thanks already for your suggestions!

Careful with the grib/netcdf data formats… they might have a structure that looks like tiles (chunks) but they lack overviews.

They tipically work well for a coarse/small grid with a lot of times/elevations, but very poorly for high resolution data (due to the lack of overviews).
High resolution/large spatial grid GRIB files can significantly bump both CPU and memory usage. Thinking out loud, I believe the UCAR library used to read the files has been upgraded recently… but not in 2.24.0, more like in 2.26 (it happened in April 2024).

Cheers
Andrea

Hi,

we have made some more tests in order to try isolating the pronblem further.

My first try was to bombard queries to the “ne” data store of the GeoServer release data set. Unfortunately that went not well at all. I think it uses a shapefile store, which doesn’t seem to perform at all. Trying to query 4 layers in parallel using the 16 (256x256) got us something 80 requests/min. So, not useful for bomabarding the server.

Then we went back to data-sets we know about. We took 4 GeoTiiff datasets of MSG satelllite data (medium size). With these four layers we peerfmoed the test as described earlier. Here’s the result:

The GeoServer (2.24.0) managed a bit less than 6K requests / min - which is less than grib, but still ok I guess. The memory consumption still looks somehow similar to grib as well as our own raster plugins.

The folowing figure was done on the very same server

  • G1GC + 40GB max-heap + tile-recycling off

One can see that the heap stays relatively stable, which is expected because we have tile-recycling off → so not much going on on the heap.

But again, look at the overall memory (incl off-heap). It increases unbounded until eventually it crashes the system. At over 90GB of memory, Linux kills other processes, including our load-test scripts.

So at least what know now is that all raster stores we tried are affected by this somehow - incl. grib, geotiff as well as our custom three ones for NWP/EPS, sat, radar weather data.

For vector stores it seems more difficult to test - we don’t much vector data to try and “release” data seems to sparse to really put heavy load on it.

Yes, exactly. If you have tile-recycling on, then GeoServer consumes more heap, because off the tiles. Consumption of heap will eventually hit max-heap settings and trigger full GC run. And that also seems to free off-heap resources along the way - because maybe something on heap is not properly releaseing native memory.

My thinking is like this:

  • some “process” / “object” is allocating native memory
  • then that “object” has done its job and is no longer needed
  • but for some reason it is not garbage collected by the early GC runs → someone might still have weak references or anything like that → at least for some some reason its not garbage collected.
  • that means the native memory is also not freed
  • a full GC run would collect that “object” also freeding the native memory
  • but unluckily that full GC run is not triggered, because we are occupying so little heap → G1GC doesn’t dosn’t think, a full GC run is necessary

So, basically, few heap objects are allocating much native memory and are not freeing that memory, because they are somehow not garbage collected - until the next full GC run.

That full GC run doe not happen, because overall heap consumption is low.

And now tile recycling comes into the game:

It increases heap consumption until eventually a full GC run happens. That cleans the heap, and with it, also cleans the “objects” that still had native memory allocated. During the clean (finalize, dispose or whatever it is), those “objects” are freeing their native memory.

That’s my theory. I guess maybe one could probably prove this theory by explicitely triggering the grabage collector through GeoServer’s admin GUI it should have the same effect as if increasing heap consumption.

I don’t know exactly what test run you made but this looks way off the mark.
The store is a GeoPackage, and the dataset is pretty light.
I took a single sample request (there is no data caching in GeoServer) and grabbed a request that looks as follows:

While its repeated in the benchmark, it’s quite a bit larger than a tile and involves multiple layers. As said, GeoServer performs no data caching, it will re-query the data for each request.
I also took the bin package of the 2.27.1 release, which is as a poor setup as you can get for a benchmark (no JVM tweaking at all).
Here is the output of calling it a number of times with 48 threads (my laptop has 12 cores, typically WMS peaks at 2 to 4 times the number of cores, if there is no I/O wait):

ab -n 4000 -c 48 "http://localhost:8080/geoserver/ne/wms?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=true&STYLES&LAYERS=ne%3Aworld&EXCEPTIONS=application%2Fvnd.ogc.se_inimage&SRS=EPSG%3A4326&WIDTH=768&HEIGHT=384&BBOX=-10.107421875%2C37.265625%2C23.642578125%2C54.140625"
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 400 requests
Completed 800 requests
Completed 1200 requests
Completed 1600 requests
Completed 2000 requests
Completed 2400 requests
Completed 2800 requests
Completed 3200 requests
Completed 3600 requests
Completed 4000 requests
Finished 4000 requests


Server Software:        Jetty(9.4.57.v20241219)
Server Hostname:        localhost
Server Port:            8080

Document Path:          /geoserver/ne/wms?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=true&STYLES&LAYERS=ne%3Aworld&EXCEPTIONS=application%2Fvnd.ogc.se_inimage&SRS=EPSG%3A4326&WIDTH=768&HEIGHT=384&BBOX=-10.107421875%2C37.265625%2C23.642578125%2C54.140625
Document Length:        176502 bytes

Concurrency Level:      48
Time taken for tests:   14.908 seconds
Complete requests:      4000
Failed requests:        0
Total transferred:      707724000 bytes
HTML transferred:       706008000 bytes
Requests per second:    268.32 [#/sec] (mean)
Time per request:       178.891 [ms] (mean)
Time per request:       3.727 [ms] (mean, across all concurrent requests)
Transfer rate:          46361.43 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       5
Processing:    51  178  62.8    168     514
Waiting:       44  168  61.6    158     473
Total:         51  178  62.8    168     514

Percentage of the requests served within a certain time (ms)
  50%    168
  66%    196
  75%    215
  80%    228
  90%    264
  95%    294
  98%    334
  99%    356
 100%    514 (longest request)

Ah hem… that’s 268 requests per second… I have a hard time figuring out how you get only 80 per minute, in other words, 1.3 per second?

Cheers
Andrea

Hi Andrea,

thanks for your analysis! You were of course right, “ne” is a GeoPackage store - my fault. We also figured out that the bad performance of that datastore came from the fact that the data directory was located on a NAS share. I have moved to to a local drive now. That gave similar results as your tests. :+1: Sorry for the confusion.

As for the test itself. Ours is a little bit different, than yours.

For all the tests mentioned in this thread we have been using the following script: LoadTest.sh (3.7 KB).

What it does is that it defines 16 tiles (256x256) approximately over Germany area. It uses wget to fetch the 16 tiles for a layer that is passed in as a argument to the script. Then it loops that scheme endlessly.

Typically we would then call that script multiple times on multiple layers. For the GeoTiff example I provided Yesterday, I ran it like this:

./LoadTest.sh -s http://127.0.0.1:8080 -l dwd:MSG_GESAMT_HRV &
./LoadTest.sh -s http://127.0.0.1:8080 -l dwd:MSG_HRV &
./LoadTest.sh -s http://127.0.0.1:8080 -l dwd:MSG_IR108 &
./LoadTest.sh -s http://127.0.0.1:8080 -l dwd:MSG_NIR016 &

I had created four GeoTiff stores with one MSG satellite image each, then defined one layer each, and that was it.

It took about four hours to crash GeoServer:

  • G1GC + 40GB max-heap + tile-recycling off

Comparing this test to the grib plugin test, we can say:

  • the outcome is pretty similar
  • except grib crashed faster
  • but grib had a smaller heap memory foot-print

In fact, the curves of the GeoTiff test more closely match the curves that we encounter also on our own raster data store implementations. While grib seem to crash pretty fast, on our data stores we are seeing these steps and it takes a couple of hours for the server to crash - so pretty similar to GeoTiff.

But the overall problem remains, something is eating all the off-heap memory until eventually the system crashes.

Alright, latest findings.

After I was able to fix the performance problems of the “ne” dataset, by moving the GEOSERVER_DATA_DIR from the NAS to local storage, we were able to replicate the tests using the vector data of that “ne” store.

We ran the before mentioned tests (16 tiles (256x256)) on the following layers:

./LoadTest.sh -s http://127.0.0.1:8080 -l ne:countries &
./LoadTest.sh -s http://127.0.0.1:8080 -l tiger:giant_polygon &
./LoadTest.sh -s http://127.0.0.1:8080 -l nurc:Arc_Sample &
./LoadTest.sh -s http://127.0.0.1:8080 -l ne:coastlines &
./LoadTest.sh -s http://127.0.0.1:8080 -l ne:world &

Note, that we ran these five layers twice. So we actually have 10 endless loops with each of the above layers twice, each running the 16 tiles requests.

That makes the following memory curve:

Interestingly, this test seems to be pretty stable in memory consumption. Note that for the upper left chart, the overall memory is shown, not just GeoServer. GeoServer itself looks like this:

So it looks as if this processing chain is ok.

It makes us think that indeed something’s going on in the grid coverage processing, which is surly the basis of GeoTiff, grib/netcdf as well as our own raster store implementations.

Nice analysis, thank you @skalesse. I hope that we continue along these lines and can isolate the root cause of your problem.

Like Jody said above, it’s important for the whole community to collaborate, do their own testing and contribute to the enhanced stability of the open source project. Thank you for leading the way.

Peter