Hello GeoServer developers.
I’ve been trying to track down a pesky bug that is causing some layers in GeoServer to eventually throw a StackOverflowError during GetCapabilities and GetMaps calls to the layer. If left unaddressed, even things like trying to navigate to the Layer list page in GeoServer or the Layer Preview page would also through StackOverflowErrors.
I’ve captured some of the details in this ticket: https://osgeo-org.atlassian.net/browse/GEOS-8603
With a lot of help from Gabriel Roldan, we dug into the issue and found the main symptom to be that, over time, the DataStoreInfo object in the Catalog would keep getting wrapped in extra layers of SecuredDataStoreInfo and ModificationProxy instances. Debugging and catching the error showed that the issue stemmed from REST calls that modified FeatureTypeInfo elements in the Catalog.
Further debugging led us to SecuredFeatureTypeInfo where the getStore() is overridden to return a wrapped SecuredDataStoreInfo instance, but setStore(StoreInfo) is not overridden to ensure that the StoreInfo is unwrapped before setting the value on the delegate. This seems to present a memory leak of sorts, as a PUT of a FeatureTypeInfo, in this environment, seems to use OwsUtils.copy() to invoke the decorated getter getStore() toretrieve a decorated DataStoreInfo that is then set on the target FeatureTypeInfo. If this process is repeated over and over, it will continually wrap the FeatureTypeInfo’s DataStoreInfo with another decorated layer.
When the object is eventually need to provide a response to a GetCapabilities or GetMap request, ResourcePool.getDataStore(DataStoreInfo) is called and the first thing that happens there is deep cloning of the DataStoreInfo. If the FeatureTypeInfo associated with the DataStore has been modified enough times, cloning the object will recursively dive into the nested decorator wrappers and eventually, a StackOverflowError happens trying to serialize or deserialize the object.
Gabriel and I have tested out a small patch to at least prevent the nested wrapping of DataStoreInfo by providing a SecuredFeatureTypeInfo.setStore(StoreInfo) override that unwraps the StoreInfo provided if it is secured. In the limited testing of repeated PUTs of the FeatureTypeInfo, it seems to prevent the repeated nesting of the catalog info and does not cause a StackOverflowError.
I’ve put the changes into a PR here: https://github.com/geoserver/geoserver/pull/2771
and welcome a discussion and feedback.