Hello there,
I’ve created a PR to fix the jms-cluster extension build on behalf of koordinates.com, as its tests were failing (https://github.com/geoserver/geoserver/pull/4040).
I’m not sure if there’s a fixed module maintainer but based on commit history I’d say Nuno and Andrea.
In any case, I’m willing to put some more work into it, so would like to first check it is ok for me to contribute to those modules, and whether you guys could take a look at those pr’s in the near term, just so we don’t run off a custom branch for too long.
Now a couple of questions. While digging into the code I found a couple of things I’m not completely clear on, maybe I’m missing something, so bear with me.
The most concerning issue to me is that slaves apply changes when handling catalog pre-modify events. When handling post-modify events, slaves merely force their local catalog to fire a post-modify event.
This seems to mean that if something fails at master between pre-modify and post-modify, slaves might end up out of sync since they’ve applied the change while handing the pre-modify event. That is to say, unless I’m missing something important, that slaves should apply the modifications when handling post-modify events and not pre-modify events, and that post-modify events on the slaves are currently being applied twice for each change, when applying the change locally at pre-modify handling (the local catalog would fire them), and again when dealing with the post-modify event (in JMSCatalogPostModifyEventHandler)?
As for dependencies, there are 4 apache geronimo dependencies that seem unnecessary, like in, we’re not using JMS JTA transactions are we? or are those geronimo dependencies somehow needed at runtime? I can run all the tests without them but not sure whether they would be needed at run time based on some configuration aspect?
There are possibly other improvements that can be done, not as crucial, such as
- making sure the updateSequence doesn’t get out of sync between master(s) and slaves (currently slaves end up with master’s updateSequence + 1)
- assess and maybe improve JMS transaction throughput, reducing the message size (e.g. events payload is the fully xstream-encoded catalog object, which in some cases is rather large, where it could be proxified like the ones saved to disk - using ids instead of fully encoded CatalogInfo relationships)
- I’m not totally certain of the implications of the following producer.disable/enable pattern under concurrency in all message handlers:
public @Override boolean synchronize(JMSGlobalModifyEvent ev) {
try {
final GeoServerInfo localObject = localizeGeoServerInfo(geoServer, ev);
producer.disable();
this.geoServer.save(localObject);
} finally {
producer.enable();
}
return true;
}
That is, “producer” is a ToggleSwitch, which toggles either master or slave behavior.
By looking at the bean wiring in applicationContext.xml, all handlers are given a ref to the server toggle (JMSToggleProducer), while the slave one (JMSToggleConsumer) is unused.
Calling producer.enable()/disable() dispatches a ToggleEvent that in turn is caught by all JMSApplicationListeners (i.e. slave message processor, and master catalog and configuration producers), to temporarily disable event dispatching.
So by looking at the above snippet, it seems to me like in a multi-master set up (e.g. nodes being both master and slave), the processing of an incoming message (handler.synchronize) would prevent the same node to broadcast events from changes being applied on that node to other nodes?
I guess that’s it for now, please let me know whether those are legit concerns or I’m missing something, I’m willing to set up test cases and fixes but wanted to check my understanding first.
Cheers,
Gabriel.
···
Gabriel Roldán