Hi GeoServer developers
I have been battling with this problem for a while and have just managed to narrow it down to a reproducible problem on the default deployment.
We are programmatically creating workspaces, users, roles, ACLs etc using the REST API. Most of the time this works fine. Very occasionally, doing this will corrupt the GeoServer catalog (in memory first, followed later by corrupting the XML files), typically resulting in 403 authentication errors. The XML files are still valid, they are just missing some roles, users, etc.
To reproduce:
Attached is a short python script that repeatedly creates a new role with a random name, running synchronously.
While it is running, try logging out of the GeoServer web GUI and then logging back in to see whether your (admin) user still has any rights. You probably have to do this repeatedly. Before the script creates 1000 roles, it will consistently fail for me:
Versions tested: 2.19.1 and 2.20.0
Just by looking at the verbose logging, I think I have narrowed it down to the synchronized (service) (line 189 of AbstractRoleStore.java) and synchronized (this) (line 106 of AbstractRoleService.java) locks not working 100% correctly. I am trying to get a Java resource in the company to take a look at it further.
I don’t know whether GeoServer is periodically triggering a refresh of the authentication cache, but I thought that was set to 10 minutes, and this happens much sooner than that.
Can others please try to reproduce the problem (preferably on a separate fresh install of GeoServer, alternatively backup your data/security directory first)?
Can the core developers please give us guidance on how to possibly fix this critical (for us) problem?
Thank you
Peter
On Tue, 27 Jul 2021 at 10:42, Andrea Aime <notifications@anonymised.com> wrote:
@aaime commented on this pull request.
In doc/en/user/source/rest/index.rst:
> @@ -9,6 +9,7 @@ REST is an acronym for "`REpresentational State Transfer <[http://en.wikipedia.or](http://en.wikipedia.or) Operations on resources are implemented with the standard primitives of HTTP: GET to read; and PUT, POST, and DELETE to write changes. Each resource is represented as a URL, such as ``[http://GEOSERVER_HOME/rest/workspaces/topp``](http://GEOSERVER_HOME/rest/workspaces/topp). +.. warning:: While the OGC requests are thread-safe and can (actually must) be called in parallel, the REST write methods (PUT, POST, DELETE) are `not thread-safe <[https://files.speakerdeck.com/presentations/2b500ab2712d44129a17f4b7e28c17df/slide_20.jpg](https://files.speakerdeck.com/presentations/2b500ab2712d44129a17f4b7e28c17df/slide_20.jpg)>`. Care must be taken to call these sequentially and synchronously, otherwise you risk corrupting your configuration files.
It’s true that the catalog is not thread safe, but the REST API is protected by a global read/write lock, so no care should actually be needed, write methods are getting serialized automatically. Well, unless the admin was reckless and disabled the locks via the undocumented system variable.
(attachments)
geoserver_role.py (704 Bytes)