[GEOS-11284] Promote community module "datadir catalog loader" to core

Hi all,

been working hard on getting the datadir-catalog-loader community module ready for core.

Please check this pull request: [GEOS-11284] Promote community module “datadir catalog loader” to core

The old PR was a mess to rebase after the Palantir formatting.
Also, curiously enough, the sample datadir, worked with the community module as it was, but not with the refactored code from the old pr.

In any case, I’ve cleaned things up a lot in this one, making the new loader play nicely with GeoServerLoader and DefaultGeoServerLoader, instead of re-doing a lot of their workflow.

Added a bunch of test cases for invalid datadir configs.

The next commit will be about removing the community module and moving these classes to core.

Meanwhile, any testing is very much welcomed.

You can download these two files to run GeoServer against the sample data directory from docker compose:

  • initialize_volume.sh: creates a docker volume named geoserver_nfs_data. Downloads the sample datadir file if it’s not present in the current directory.
  • compose.yml: GeoServer docker composition to load the catalog from an NFS share using geoserver_nfs_data volume

So just download both files and run

chmod +x ./initialize_volume.sh
./initialize_volume.sh
docker compose up -d

When ready GeoServer will be available at http://localhost:8080/geoserver/

Looking forward to your comments.

Cheers,
Gabe

Hey Gabe do you want write up a proposal to make this change prior to 2.27.0 release? You can see a recent example as a starting point.

Here is proposal: https://github.com/geoserver/geoserver/wiki/GSIP-231

Voting is open, if this can complete ahead of 2.27.0 it would be ideal.

@groldan for my own feedback the environmental variable DATADIR_LOADER_ENABLED does not make much sense when in core. I wonder if DATADIR_LOAD_PARALLELISM=0 could instead be a single setting and when DATADIR_LOAD_PARALLELISM=0 the datadir catalog loader could be disabled.

I would not go with “parallel” because the existing data dir loader is also parallel the parallelism can be controlled by changing the default java fork/join pool size.

Cheers
Andrea

Is there a good name for the two implementations? Perhaps we can have DATADIR_LOADER=<name> rather than enabled/disabled or true/false flags.

Thanks Jody,

I’ve updated the proposal.
camptocamp

Hi devs,

Just a friendly reminder that GSIP-231Promote datadir catalog loader to core and its related pull request are ready for review.
Looking forward to your feedback. If possible, we’d like it to be ready for the 2.27.0 release, for which I’d encourage you, as
discussed in the last PSC meeting, to try it out with your data directories and report back any trouble you might encounter.

Thank you very much,
Gabe

Hi all,
I second Gabriel, the tests I believe are important are:

  • Large data directories, on whatever storage you have (please report the numbers of old and new loaders)
  • Simpler data directories, but with a lot of plugins enabled, as each might introduce something affecting the startup process and cause exceptions

I have tested, and offer my +1 for this proposal.

I personally do not have very interesting data directories to test against so it feels like my testing is of little value.

May I suggest we merge and ask for public testing and feedback.

  • It has been 12 days since @groldan asked, and our policy says 10 (yes I only helped by wiki page last week though …)
  • We have the fallback option setting this data loader to “false” by default if apprehensive for release.

+1 here too

Reminder that the data directories are interesting also in combination with extensions that might
do something unexpected and deadlock the startup. One does not need 1 million layers,
all it takes is one with the right combo of settings and extensions.

+0 from me.

+1

The loading time improved on my local environment (non-NFS) from 120 sec to 90 sec.

EDIT: I may not have actually seen these improvements, but +1 still.

Approx 100 workspaces, 120 stores, 2600 layers

STABLE_EXTENSIONS: authkey,css,monitor,web-resource,gwc-s3,sqlserver,charts,vectortiles,geopkg-output

Peter

I have tested with a range of local directories and am + 1.

The PR is merged and I have asked for public feedback on the user forum.