[Geoserver-devel] performance regression with [GEOS-11423] Use securityFilter in SecureCatalogImpl bulk get methods

Niels · February 20, 2025, 9:12am

Hello,

It seems that the changes in
[GEOS-11423] Use securityFilter in SecureCatalogImpl bulk get methods by groldan · Pull Request #7698 · geoserver/geoserver · GitHub have caused a serious
performance regression in certain situations.

In one setup, the wfs getfeature POST request has gone from <0.5 seconds
to >4 seconds.

The setup where I observed this uses jdbcconfig with hundreds of layers
and security rules (per layer). This is somewhat ironic because the
pull request was meant to improve performance with database catalogues
in particular.

The problem is that building the security filter when you have hundreds
of rules is very expensive. Rules are specified with layer names and
they all have to be translated to IDs. I tried improvements to
jdbcconfig in using the cache, this has reduced the time from 4 seconds
to 1.5 seconds, a substantial difference but still substantially more
expensive than the old version. The problem remains that building the
filter is too expensive with many rules. Would it be a good idea to keep
the security filter in a cache, indexed by Pairextends CatalogInfo> ?

The issue does not occur for GET requests because it does not trigger
WFSXmlUtils.initWFSConfiguration which triggers getting all layers from
the catalogue (secured). Is it perhaps possible to cache the WFS
configuration between different requests? Is using the secured catalog
even required here for parsing the XML? (I have no idea)

Kind Regards

Niels

groldan · February 20, 2025, 3:26pm

Hi Niels,

Bad news we all know about jdbcconfig inefficiencies.

Good news, re:

The issue does not occur for GET requests because it does not trigger
WFSXmlUtils.initWFSConfiguration which triggers getting all layers from
the catalogue (secured).

Please check out this comment: https://github.com/geoserver/geoserver/pull/8301#pullrequestreview-2607904376
Basically WFSXmlUtils.initWFSConfiguration does not need to eagerly populate an in-memory FeatureTypeCache,
it can use a CatalogFeatureTypeCache instead.

Can you try that and follow up with a pr?

Cheers,

Niels · February 20, 2025, 4:15pm

Okay. That seems like an important idea.

Still, I wonder about the creation of the security filter when you have 700+rules. I can tell you, that is one hell of a filter. I suspect even with the regular catalogue that would be an expensive operation.

aaime-geosolutions · February 20, 2025, 6:29pm

Makes me think about the streaming renderer… it tries to push down to the data source all the filters contained in the active rules, but practice has shown that it’s not always possible, e.g., the generated SQL query can become too big to send to the database.

See here:

github.com

geotools/geotools/blob/affa2d6174a03c96dbb51ea75709d8f26738c6e8/modules/library/render/src/main/java/org/geotools/renderer/lite/StreamingRenderer.java#L1523


      
           * <p>DJB: trying to be smarter. If there are no "elseRules" and no rules w/o a filter, then it makes sense to send
           * them off to the Datastore We limit the number of Filters sent off to the datastore, just because it could get a
           * bit rediculous. In general, for a database, if you can limit 10% of the rows being returned you're probably doing
           * quite well. The main problem is when your filters really mean you're secretly asking for all the data in which
           * case sending the filters to the Datastore actually costs you. But, databases are *much* faster at processing the
           * Filters than JAVA is and can use statistical analysis to do it.
           */
          private void processRuleForQuery(List<LiteFeatureTypeStyle> styles, Query q) {
              try {
          
                  // first we check to see if there are >
                  // "getMaxFiltersToSendToDatastore" rules
                  // if so, then we dont do anything since no matter what there's too
                  // many to send down.
                  // next we check for any else rules. If we find any --> dont send
                  // anything to Datastore
                  // next we check for rules w/o filters. If we find any --> dont send
                  // anything to Datastore
                  //
                  // otherwise, we're gold and can "or" together all the filters then
                  // AND it with the original filter.

The default max is pretty conservative, only 5 (odd, I thought it was 20), when one goes beyond only the bbox filter is sent down, and filtering is performed in memory.

Now, without appyling the very same limit, maybe a similar mechanism can be implemented? Decide whether to pre-filter or post-filter based on a threshold on the number of filters being sent down?

Just thinking out loud, mind.

Cheers
Andrea