[Geoserver-devel] GWC geometryless check on startup resulting in unberably long startup times

Hi,
I’ve checked a few complaints on the user list claiming the startup time has
become hours long with tens of thousands of layers… at first I thought I could
not reproduce, because I saw the catalog load 10k layers in like 30 seconds,
but later realized the startup sequence was not done and GWC was just killing it.

The trace one sees while that happens is the following:

“main” #1 prio=5 os_prio=0 tid=0x00007fc304018000 nid=0x6f68 runnable [0x00007fc30d70d000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:70)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:283)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1919)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:291)

  • locked <0x00000006c7e38fa8> (a org.postgresql.core.v3.QueryExecutorImpl)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:432)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:358)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:305)
    at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:291)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:269)
    at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:236)
    at org.postgresql.jdbc.PgDatabaseMetaData.getPrimaryKeys(PgDatabaseMetaData.java:2381)
    at org.apache.commons.dbcp.DelegatingDatabaseMetaData.getPrimaryKeys(DelegatingDatabaseMetaData.java:456)
    at org.geotools.jdbc.HeuristicPrimaryKeyFinder.getPrimaryKey(HeuristicPrimaryKeyFinder.java:49)
    at org.geotools.jdbc.CompositePrimaryKeyFinder.getPrimaryKey(CompositePrimaryKeyFinder.java:52)
    at org.geotools.jdbc.JDBCDataStore.getPrimaryKey(JDBCDataStore.java:1095)
  • locked <0x00000006c66432b8> (a org.geotools.jdbc.JDBCDataStore)
    at org.geotools.jdbc.JDBCFeatureSource.(JDBCFeatureSource.java:95)
    at org.geotools.jdbc.JDBCDataStore.createFeatureSource(JDBCDataStore.java:941)
    at org.geotools.data.store.ContentDataStore.getFeatureSource(ContentDataStore.java:395)
    at org.geotools.data.store.ContentDataStore.getFeatureSource(ContentDataStore.java:360)
    at org.geotools.data.store.ContentDataStore.getSchema(ContentDataStore.java:344)
    at org.geotools.data.store.ContentDataStore.getSchema(ContentDataStore.java:712)
    at org.geotools.data.store.ContentDataStore.getSchema(ContentDataStore.java:103)
    at org.geoserver.catalog.ResourcePool.getCacheableFeatureType(ResourcePool.java:955)
  • locked <0x00000006c68de048> (a org.geoserver.catalog.ResourcePool$FeatureTypeCache)
    at org.geoserver.catalog.ResourcePool.tryGetFeatureType(ResourcePool.java:936)
    at org.geoserver.catalog.ResourcePool.getFeatureType(ResourcePool.java:922)
    at org.geoserver.catalog.ResourcePool.getFeatureType(ResourcePool.java:917)
    at org.geoserver.catalog.impl.FeatureTypeInfoImpl.getFeatureType(FeatureTypeInfoImpl.java:120)
    at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.geoserver.catalog.impl.ModificationProxy.invoke(ModificationProxy.java:147)
    at com.sun.proxy.$Proxy10.getFeatureType(Unknown Source)
    at org.geoserver.gwc.layer.CatalogConfiguration.isLayerExposable(CatalogConfiguration.java:481)
    at org.geoserver.gwc.config.GWCInitializer.addLayersToNotCache(GWCInitializer.java:295)
    at org.geoserver.gwc.config.GWCInitializer.initialize(GWCInitializer.java:165)

Basically GWC is checking that each and every cached layer is not referring to a geometryless layer.
That in turn triggers the computation of the table structure for each layer, which of course takes a loooong time
(hours, in the case of the users reporting it).

The method causing the problem has bee introduced in this pull request:

https://github.com/geoserver/geoserver/pull/837

I had no involvement in the development or review of it, but looking at it now, I’m wondering why there
is even a need for it.
First, the code seems to be referring specifically to the GWC in memory caching, which might or might not be enabled,
and second, a geometryless layer is not cachable no matter what. Indeed I’ve tried to configure one:

Inline image 1

and going to the tile caching tab I get:

Inline image 2

So the check if ta layer is geometryless looks completely superfluous to me.
I would then suggest to remove this check:

https://github.com/geoserver/geoserver/blob/master/src/gwc/src/main/java/org/geoserver/gwc/config/GWCInitializer.java#L295

I cannot be sure about the other checks (like, I’m not sure why the code would even have to collect on startup
the layers that do not have caching enabled, seems like a check that should be done on demand), but removing this
one at least keeps the startup time under control (10k layers, from startup to a working admin UI in less than a minute)

Cheers
Andrea

···

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Good work on tracking this down Andrea,

The only thing I can think of is that it may be possible to create a geometryless cached layer either through the REST API or by modifying the table after creation.

Ian

(attachments)

image.png
image.png

···

On 4 December 2016 at 10:13, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
I’ve checked a few complaints on the user list claiming the startup time has
become hours long with tens of thousands of layers… at first I thought I could
not reproduce, because I saw the catalog load 10k layers in like 30 seconds,
but later realized the startup sequence was not done and GWC was just killing it.

The trace one sees while that happens is the following:

“main” #1 prio=5 os_prio=0 tid=0x00007fc304018000 nid=0x6f68 runnable [0x00007fc30d70d000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:70)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:283)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1919)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:291)

  • locked <0x00000006c7e38fa8> (a org.postgresql.core.v3.QueryExecutorImpl)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:432)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:358)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:305)
    at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:291)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:269)
    at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:236)
    at org.postgresql.jdbc.PgDatabaseMetaData.getPrimaryKeys(PgDatabaseMetaData.java:2381)
    at org.apache.commons.dbcp.DelegatingDatabaseMetaData.getPrimaryKeys(DelegatingDatabaseMetaData.java:456)
    at org.geotools.jdbc.HeuristicPrimaryKeyFinder.getPrimaryKey(HeuristicPrimaryKeyFinder.java:49)
    at org.geotools.jdbc.CompositePrimaryKeyFinder.getPrimaryKey(CompositePrimaryKeyFinder.java:52)
    at org.geotools.jdbc.JDBCDataStore.getPrimaryKey(JDBCDataStore.java:1095)
  • locked <0x00000006c66432b8> (a org.geotools.jdbc.JDBCDataStore)
    at org.geotools.jdbc.JDBCFeatureSource.(JDBCFeatureSource.java:95)
    at org.geotools.jdbc.JDBCDataStore.createFeatureSource(JDBCDataStore.java:941)
    at org.geotools.data.store.ContentDataStore.getFeatureSource(ContentDataStore.java:395)
    at org.geotools.data.store.ContentDataStore.getFeatureSource(ContentDataStore.java:360)
    at org.geotools.data.store.ContentDataStore.getSchema(ContentDataStore.java:344)
    at org.geotools.data.store.ContentDataStore.getSchema(ContentDataStore.java:712)
    at org.geotools.data.store.ContentDataStore.getSchema(ContentDataStore.java:103)
    at org.geoserver.catalog.ResourcePool.getCacheableFeatureType(ResourcePool.java:955)
  • locked <0x00000006c68de048> (a org.geoserver.catalog.ResourcePool$FeatureTypeCache)
    at org.geoserver.catalog.ResourcePool.tryGetFeatureType(ResourcePool.java:936)
    at org.geoserver.catalog.ResourcePool.getFeatureType(ResourcePool.java:922)
    at org.geoserver.catalog.ResourcePool.getFeatureType(ResourcePool.java:917)
    at org.geoserver.catalog.impl.FeatureTypeInfoImpl.getFeatureType(FeatureTypeInfoImpl.java:120)
    at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.geoserver.catalog.impl.ModificationProxy.invoke(ModificationProxy.java:147)
    at com.sun.proxy.$Proxy10.getFeatureType(Unknown Source)
    at org.geoserver.gwc.layer.CatalogConfiguration.isLayerExposable(CatalogConfiguration.java:481)
    at org.geoserver.gwc.config.GWCInitializer.addLayersToNotCache(GWCInitializer.java:295)
    at org.geoserver.gwc.config.GWCInitializer.initialize(GWCInitializer.java:165)

Basically GWC is checking that each and every cached layer is not referring to a geometryless layer.
That in turn triggers the computation of the table structure for each layer, which of course takes a loooong time
(hours, in the case of the users reporting it).

The method causing the problem has bee introduced in this pull request:

https://github.com/geoserver/geoserver/pull/837

I had no involvement in the development or review of it, but looking at it now, I’m wondering why there
is even a need for it.
First, the code seems to be referring specifically to the GWC in memory caching, which might or might not be enabled,
and second, a geometryless layer is not cachable no matter what. Indeed I’ve tried to configure one:

Inline image 1

and going to the tile caching tab I get:

Inline image 2

So the check if ta layer is geometryless looks completely superfluous to me.
I would then suggest to remove this check:

https://github.com/geoserver/geoserver/blob/master/src/gwc/src/main/java/org/geoserver/gwc/config/GWCInitializer.java#L295

I cannot be sure about the other checks (like, I’m not sure why the code would even have to collect on startup
the layers that do not have caching enabled, seems like a check that should be done on demand), but removing this
one at least keeps the startup time under control (10k layers, from startup to a working admin UI in less than a minute)

Cheers
Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.



Check out the vibrant tech community on one of the world’s most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


Geoserver-devel mailing list
Geoserver-devel@anonymised.com.366…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Ian Turton

On Sun, Dec 4, 2016 at 12:08 PM, Ian Turton <ijturton@anonymised.com> wrote:

Good work on tracking this down Andrea,

The only thing I can think of is that it may be possible to create a
geometryless cached layer either through the REST API or by modifying the
table after creation.

Good point... the same goes for normal caching tile caching however, no
geometry, nothing to paint, nothing to cache (besides the error you'll
get).
I'll need to understand why the in memory caching is different, if any,
from a normal on disk cache to the point of needing an
extra level of protection

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi,
so I’ve made a few checks and the init time verification seems to be indeed useless.

I’ve prepared a pull request to speed up loading a catalog with many layers (but only one store):
https://github.com/geoserver/geoserver/pull/2008

It has two commits, the first one removes the GWC init checks taking down the startup time from “too long for my patience, > 30m anyways” to 4 and
a half minute, the second one adds extra optimizations taking the startup time down to around one minute.

All timings measured from a clear file system cache (“echo 3 > /proc/sys/vm/drop_caches” under linux).

The residual startup time is surprisingly not much due to IO itself, but to the creation and usage of millions of
ModificationProxy during startup… I did not have time to investigate further and decided to take what I have now as a win :wink:

Cheers
Andrea

···

On Sun, Dec 4, 2016 at 2:50 PM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Sun, Dec 4, 2016 at 12:08 PM, Ian Turton <ijturton@anonymised.com> wrote:

Good work on tracking this down Andrea,

The only thing I can think of is that it may be possible to create a geometryless cached layer either through the REST API or by modifying the table after creation.

Good point… the same goes for normal caching tile caching however, no geometry, nothing to paint, nothing to cache (besides the error you’ll get).
I’ll need to understand why the in memory caching is different, if any, from a normal on disk cache to the point of needing an
extra level of protection

Cheers

Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.