[Geoserver-devel] Some follow up to GSIP-155 with a larger data dir

Hi,
after GSIP-155 landed I’ve also merged https://osgeo-org.atlassian.net/browse/GEOS-7954
which saves the datastore connection checks if the capabilities generation is setup to
skip misconfigured layers.

Then, worked on some little tweaks here and there, and updated the catalog bulk load
tool to optionally perform deep copies of workspaces and created a new configuration
that has a lot of stores and layers of most kinds.

In particular, took the release data dir, merged everything in it into a single workspace,
and then cloned said workspace 1000 times.
This resulted in a data directory with:

  • 1001 workspaces

  • 11000 stores, a mix of shapefiles, postgis, directory of shapefile, single tiff, arcgrid, mosaics

  • 42000 layers and 42000 associated tile layers
    A few numbers on this beast with the default catalog facade:

  • Cold startup time: I’ve seen values between 230 seconds and 290 seconds (not sure why such a high variability)

  • Hot startup time: 59 seconds

  • Load testing a states layer stored in postgis (same as GSIP-155 benchmar), 35ms, that is, 1ms more than the GSIP-155 test with 10k layers

I’ve also loaded it in the jdbconfig (conversion took 37 mins), here are the numbers for it:

  • Cold startup time: 290 seconds

  • Hot startup time: 120 seconds

  • Load testing a states layer stored in postgis (same as GSIP-155 benchmark), 117ms (same as the timings in GSIP-155)
    The JDBCConfig startup time is almost fully spent querying postgresql like crazy for the internal layers associated to gwc tile layers, and finding

which layers should not be cached in the in-memory GWC cache (this last one makes it do a full scan of all layers).

This marks the end of these investigations/optimizations for the time being: I see no more obvious ways to make default catalog facade faster, besides
trying to parallellize loading the catalog itself (which is hard enough, won’t try to do it in the short term).

Cheers
Andrea

PS: someone off line told me this work is killing jdbcconfig… I don’t quite agree.
It’s just showing that JDBCConfig is doing too much work to translate layer names into internal ids, and that some caches from name to id are needed to
make it competitive when serving OGC requests (along with some clustering support to drop the name → id cache
when names change).

JDBCConfig likely remains quite a bit faster in terms of startup time for configurations that have few cached layers,
but if the GWC configuration could also be moved inside the database then it would likely start-up in 20-30 seconds
no matter how many layers are in the catalog, while the startup time of default catalog facade depends linearly to the amount of
layers in the catalog.
All it needs is extra effort to improve it.

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


On Sat, Feb 11, 2017 at 3:37 PM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

A few numbers on this beast with the default catalog facade:

   - Cold startup time: I've seen values between 230 seconds and 290
   seconds (not sure why such a high variability)
   - Hot startup time: 59 seconds

Aaand down to 45 seconds using this pull request:

https://github.com/geoserver/geoserver/pull/2115

Can anyone review? :slight_smile:

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------