[Geoserver-devel] Workpace/layer home page selection problematic with large data directories

Hi all,
I’ve just got a report from a customer that they tried to upgrade to 2.22.0, but had to quickly revert back to 2.21.x, as the GeoServer home page was unreachable.

What is interesting about that deployment is the number of layer, well above 20k. Not the largest I’ve seen, but large. Also, all the layers are sourced from an Oracle database.
In their case, the home page takes several minutes to load.

Locally I have an oddball test data directory with 40k layers, but with an easing factor, it’s a “many times copy” of the GeoServer demo layers, meaning it’s all shapefiles.
The landing page for me displays quick enough (few seconds), but then the browser is on its knees, completely unresponsive, for 10+ seconds.
After that, trying to use the workspace/layer dropdown also incurs in severe slowdown, with the browser blocked for several seconds.
Chrome reports that one tab with the home page is using 776MB of memory, too.

Considering I’ve seen installations with up to 1 million layers (a case where they actually had 3 millions, and split them across 3 different data directories), this is a serious problem…

I have also seen Gabriel experiment with large geoserver-cloud deployments with a lot of workspaces (tens of thousands? more?) but I cannot find the relevant branch anymore (believe it was about better parallelizing data directory loading, cannot find the commit anymore).

How to address it though? Throwing in a couple of ideas:

  • Make the functionality opt-in or opt-out via a flag or UI configuration. The flag might be hard to discover, but the UI setting could be hard to reach if one cannot get to the home page to start with…
  • Automatically disable the dropdowns after a certain threshold of workspaces layers is reached, with the threshold being configurable? Say 1000 for example? However it might still cause issues for data sources that are slow to be connected (I’m guessing part of the slowness is due to some data type verification that requires actual connection to the data source, based on the fact the Oracle seems a lot slower to just generate the page
    Any other idea?

Cheers
Andrea

···

GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272

mob: +39 339 8844549

https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail

Andrea;

I sure wish this was tested with a large number of layers during the RC; but good we are getting the feedback now.

Ideas:

  1. Over 1k layers? Switch to a mode where workspace is selected 1st; then enable layer the layer selector with a smaller list of layers?
  2. Over 1k layers? Make the controls into simple text fields; no look ahead (guess this is similar to your suggestion)

But yeah I was just hitting the catalog api; could we check what data type verification is being hit? Most likely an enabled / available check?


Jody Garnett

On Thu, Nov 24, 2022 at 2:18 AM Andrea Aime <andrea.aime@…6887…> wrote:

Hi all,
I’ve just got a report from a customer that they tried to upgrade to 2.22.0, but had to quickly revert back to 2.21.x, as the GeoServer home page was unreachable.

What is interesting about that deployment is the number of layer, well above 20k. Not the largest I’ve seen, but large. Also, all the layers are sourced from an Oracle database.
In their case, the home page takes several minutes to load.

Locally I have an oddball test data directory with 40k layers, but with an easing factor, it’s a “many times copy” of the GeoServer demo layers, meaning it’s all shapefiles.
The landing page for me displays quick enough (few seconds), but then the browser is on its knees, completely unresponsive, for 10+ seconds.
After that, trying to use the workspace/layer dropdown also incurs in severe slowdown, with the browser blocked for several seconds.
Chrome reports that one tab with the home page is using 776MB of memory, too.

Considering I’ve seen installations with up to 1 million layers (a case where they actually had 3 millions, and split them across 3 different data directories), this is a serious problem…

I have also seen Gabriel experiment with large geoserver-cloud deployments with a lot of workspaces (tens of thousands? more?) but I cannot find the relevant branch anymore (believe it was about better parallelizing data directory loading, cannot find the commit anymore).

How to address it though? Throwing in a couple of ideas:

  • Make the functionality opt-in or opt-out via a flag or UI configuration. The flag might be hard to discover, but the UI setting could be hard to reach if one cannot get to the home page to start with…
  • Automatically disable the dropdowns after a certain threshold of workspaces layers is reached, with the threshold being configurable? Say 1000 for example? However it might still cause issues for data sources that are slow to be connected (I’m guessing part of the slowness is due to some data type verification that requires actual connection to the data source, based on the fact the Oracle seems a lot slower to just generate the page
    Any other idea?

Cheers
Andrea

==

GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272

mob: +39 339 8844549

https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Andrea:

One more idea - does your customer have global services on or off when working with such a large catalogue

I could see changing the workflow to select workspace → select layers when global services are off.

···


Jody Garnett

Hi Andrea,

for the record, here’s the branch with the catalog loader optimization [1]. I need to add some docs before
proposing it as a community module, but it’s working ok in a vanilla geoserver deployment with ~80k layers, ~4k workspaces,
and wms/wfs/wcs/wmts services configured for each workspace individually, which surprisingly was a big perf offender.

That said, it won’t help with the home page combos at all.

My proposal would be to use progressive loading instead of preemptive loading of all workspaces/layers. The downside is you need to
know at least a couple of letters about what you’re looking for, but IMO it’s a good compromise. Catalog-side wise, it’d only perform
well if there’s an actual full-text-search engine backing the search. Back in the day I had a prototype for an in-memory lucene index
running the searches for the UI’s full text searches that worked like a charm, but IIRC missed a good update of the index whenever
something changes. That could be something to do some research on.

[1] https://github.com/groldan/geoserver/tree/catalog/perf/data_directory_loader

···

Gabriel Roldán

Hi Andrea,

for the record, here’s the branch with the catalog loader optimization [1]. I need to add some docs before
proposing it as a community module, but it’s working ok in a vanilla geoserver deployment with ~80k layers, ~4k workspaces,
and wms/wfs/wcs/wmts services configured for each workspace individually, which surprisingly was a big perf offender.

Nice

That said, it won’t help with the home page combos at all.

My proposal would be to use progressive loading instead of preemptive loading of all workspaces/layers. The downside is you need to
know at least a couple of letters about what you’re looking for, but IMO it’s a good compromise.

Seems like a match with Jody’s experiment with autocomplete text areas. Wondering if it could be again some graceful degradation,
with stricter limits… dunno use dropdowns below 100 items, and autocomplete text area above.
The reason would be to avoid the “blank input anxiety” at least for the small cases. Just thinking out loud, mind!

Catalog-side wise, it’d only perform
well if there’s an actual full-text-search engine backing the search. Back in the day I had a prototype for an in-memory lucene index
running the searches for the UI’s full text searches that worked like a charm, but IIRC missed a good update of the index whenever
something changes. That could be something to do some research on.

Indeed

Cheers
Andrea

···

GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions Group
phone: +39 0584 962313

fax: +39 0584 1660272

mob: +39 339 8844549

https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail