[Geoserver-devel] Slow WFS transactions when GeoServer 2.1 catalog has thousands of layers

Hello,

I’ve noticed that WFS transactions (insert, update) have become slower as the number of layers in my GeoServer 2.1 instance has increased (it’s currently at a little over 4000 layers, most of them sharing one PostGIS store and the rest are geotiffs). On every WFS transaction, GeoServer seems to iterate over every layer in the catalog, I guess this is in order to generate XSD schema objects? Do you think it might be possible to avoid this somehow and get the information that it that needs to complete the transaction only from the one layer being edited?

-Matt

On 05/09/2012 11:40 AM, Bertrand, Matthew wrote:

Hello,

I've noticed that WFS transactions (insert, update) have become slower as the
number of layers in my GeoServer 2.1 instance has increased

I suppose it is the https://jira.codehaus.org/browse/GEOS-3907 issue:
setting the layer cache to an higher value assuaged the issue.

Regards,

Luca Morandini
Data Architect - AURIN project
Department of Computing and Information Systems
University of Melbourne

Thanks Luca, I gave that a try but in the case where many layers share one PostGIS store, setting the feature type cache to a high value seemed to make the transactions far slower. in one of my test data directories with about 1000 layers all on the same PostGIS store, the time needed for each WFS transaction increased from seconds to minutes after I increased the feature type cache size to 4000. If I disabled the PostGIS store and only had shapefile layers active (again about 1000), then the WFS transactions were fast. Should I be using JNDI stores instead for that many PostGIS layers?

-Matt

From: Luca Morandini <lmorandini@anonymised.com>
Reply-To: Morandini Luca <lmorandini@anonymised.com>
Date: Wed, 9 May 2012 02:06:54 -0400
To: “geoserver-devel@lists.sourceforge.net” <geoserver-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] Slow WFS transactions when GeoServer 2.1 catalog has thousands of layers

On 05/09/2012 11:40 AM, Bertrand, Matthew wrote:

Hello,

I’ve noticed that WFS transactions (insert, update) have become slower as the
number of layers in my GeoServer 2.1 instance has increased

I suppose it is the https://jira.codehaus.org/browse/GEOS-3907 issue:
setting the layer cache to an higher value assuaged the issue.

Regards,

Luca Morandini
Data Architect - AURIN project
Department of Computing and Information Systems
University of Melbourne


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On 05/09/2012 07:50 PM, Bertrand, Matthew wrote:

Thanks Luca, I gave that a try but in the case where many layers share one PostGIS
store, setting the feature type cache to a high value seemed to make the
transactions far slower. in one of my test data directories with about 1000 layers
all on the same PostGIS store, the time needed for each WFS transaction increased
from seconds to minutes after I increased the feature type cache size to 4000. If
I disabled the PostGIS store and only had shapefile layers active (again about
1000), then the WFS transactions were fast.

How curious. Anyway, consistent with my observation, since I tested the workaround with Shapefiles only.

Should I be using JNDI stores instead
for that many PostGIS layers?

I don't see how this could make a difference.

Regards,

Luca Morandini
Data Architect - AURIN project
Department of Computing and Information Systems
University of Melbourne

Unfortunately this is a known issue and even more unfortunately not one that will probably be solved soon. The way we manage schema objects for the purposes of parsing/encoding gml is a rather sad story. There are two main issues.

  1. There is no caching that occurs, which is the main problem here and is very problematic for large catalogs.

  2. We do scan the entire catalog when we really don’t have to. The only case where its necessary is for app-schema where the full scan occurs because you don’t know ahead of time for any one feature type what application schema namespaces it may reference. So a full scan is done to include all of them.

A while back I experimented on a branch to fix these issues, and had some success unfortunately it was pretty exploratory code and nothing that was suitable for commit. And I never really came up with a good solution for the app-schema issue.

Long story short this one without funding or some other mandate probably won’t be fixed in the short term.

-Justin

On Tue, May 8, 2012 at 7:40 PM, Bertrand, Matthew <mbertrand@anonymised.com> wrote:

Hello,

I’ve noticed that WFS transactions (insert, update) have become slower as the number of layers in my GeoServer 2.1 instance has increased (it’s currently at a little over 4000 layers, most of them sharing one PostGIS store and the rest are geotiffs). On every WFS transaction, GeoServer seems to iterate over every layer in the catalog, I guess this is in order to generate XSD schema objects? Do you think it might be possible to avoid this somehow and get the information that it that needs to complete the transaction only from the one layer being edited?

-Matt


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

Thanks Justin!

It sounds like there’s a pretty steep learning curve to get up to speed on this part of the code, but I’d be willing to take a shot at figuring out a fix for this if you think it’s potentially doable given enough time/effort.

One thing I should correct about my previous email response – it seems the excessive time it took to complete WFS transactions with PostGIS layers after changing the feature cache size was actually triggered by reloading the config & catalog from the server status page afterward. If I skip that particular step and just set the feature cache to a high value on the global settings page, then the response time is much better. I probably shouldn’t try debugging & emailing about it at 5 in the morning!

One thing I’ve noticed is that the first WFS response after geoserver is started (or after a long enough period of inactivity) always takes awhile to complete, but subsequent transactions are much faster. However, those subsequent transactions have also slowed a bit as the number of layers has grown on our production server – from an initial speed of about 1-2 seconds (with ~2000 layers) to about 12-15 seconds now (with ~4200). The feature type cache size had already been set to 5000 a few months back.

-Matt

From: Justin Deoliveira <jdeolive@anonymised.com>
Date: Wed, 9 May 2012 09:40:04 -0400
To: Matt Bertrand <mbertrand@anonymised.com>
Cc: “geoserver-devel@lists.sourceforge.net” <geoserver-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] Slow WFS transactions when GeoServer 2.1 catalog has thousands of layers

Unfortunately this is a known issue and even more unfortunately not one that will probably be solved soon. The way we manage schema objects for the purposes of parsing/encoding gml is a rather sad story. There are two main issues.

  1. There is no caching that occurs, which is the main problem here and is very problematic for large catalogs.

  2. We do scan the entire catalog when we really don’t have to. The only case where its necessary is for app-schema where the full scan occurs because you don’t know ahead of time for any one feature type what application schema namespaces it may reference. So a full scan is done to include all of them.

A while back I experimented on a branch to fix these issues, and had some success unfortunately it was pretty exploratory code and nothing that was suitable for commit. And I never really came up with a good solution for the app-schema issue.

Long story short this one without funding or some other mandate probably won’t be fixed in the short term.

-Justin

On Tue, May 8, 2012 at 7:40 PM, Bertrand, Matthew <mbertrand@anonymised.com> wrote:

Hello,

I’ve noticed that WFS transactions (insert, update) have become slower as the number of layers in my GeoServer 2.1 instance has increased (it’s currently at a little over 4000 layers, most of them sharing one PostGIS store and the rest are geotiffs). On every WFS transaction, GeoServer seems to iterate over every layer in the catalog, I guess this is in order to generate XSD schema objects? Do you think it might be possible to avoid this somehow and get the information that it that needs to complete the transaction only from the one layer being edited?

-Matt


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Wed, May 9, 2012 at 3:40 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Unfortunately this is a known issue and even more unfortunately not one that will probably be solved soon. The way we manage schema objects for the purposes of parsing/encoding gml is a rather sad story. There are two main issues.

  1. There is no caching that occurs, which is the main problem here and is very problematic for large catalogs.

  2. We do scan the entire catalog when we really don’t have to. The only case where its necessary is for app-schema where the full scan occurs because you don’t know ahead of time for any one feature type what application schema namespaces it may reference. So a full scan is done to include all of them.

A while back I experimented on a branch to fix these issues, and had some success unfortunately it was pretty exploratory code and nothing that was suitable for commit. And I never really came up with a good solution for the app-schema issue.

Long story short this one without funding or some other mandate probably won’t be fixed in the short term.

Wondering… if all the extra work has to be done just for complex features, could we just take the shortcut if
no complex features are available on the server? Have a flag with some catalog listener updating it as
new layers are configured that keeps the information about presence of complex features up to date.
Justin thinking out loud here.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

On Wed, May 9, 2012 at 8:09 AM, Bertrand, Matthew <mbertrand@anonymised.com> wrote:

Thanks Justin!

It sounds like there’s a pretty steep learning curve to get up to speed on this part of the code, but I’d be willing to take a shot at figuring out a fix for this if you think it’s potentially doable given enough time/effort.

Help would be great. I would suggest start by familiarizing yourself with the DescribeFeatureType code and the underlying FeatureTypeSchemaBuilder, that is where the schema objects used by WFS are build.

One thing I should correct about my previous email response – it seems the excessive time it took to complete WFS transactions with PostGIS layers after changing the feature cache size was actually triggered by reloading the config & catalog from the server status page afterward. If I skip that particular step and just set the feature cache to a high value on the global settings page, then the response time is much better. I probably shouldn’t try debugging & emailing about it at 5 in the morning!

Haha no worries.

One thing I’ve noticed is that the first WFS response after geoserver is started (or after a long enough period of inactivity) always takes awhile to complete, but subsequent transactions are much faster. However, those subsequent transactions have also slowed a bit as the number of layers has grown on our production server – from an initial speed of about 1-2 seconds (with ~2000 layers) to about 12-15 seconds now (with ~4200). The feature type cache size had already been set to 5000 a few months back.

Right, this startup cost is typically the building of the internal GML schemas, which are huge and costly to load. But once loaded they are cached so you don’t pay the price for future accesses.

-Matt

From: Justin Deoliveira <jdeolive@anonymised.com>
Date: Wed, 9 May 2012 09:40:04 -0400
To: Matt Bertrand <mbertrand@anonymised.com>
Cc: “geoserver-devel@anonymised.com.sourceforge.net” <geoserver-devel@lists.sourceforge.net>

Subject: Re: [Geoserver-devel] Slow WFS transactions when GeoServer 2.1 catalog has thousands of layers

Unfortunately this is a known issue and even more unfortunately not one that will probably be solved soon. The way we manage schema objects for the purposes of parsing/encoding gml is a rather sad story. There are two main issues.

  1. There is no caching that occurs, which is the main problem here and is very problematic for large catalogs.

  2. We do scan the entire catalog when we really don’t have to. The only case where its necessary is for app-schema where the full scan occurs because you don’t know ahead of time for any one feature type what application schema namespaces it may reference. So a full scan is done to include all of them.

A while back I experimented on a branch to fix these issues, and had some success unfortunately it was pretty exploratory code and nothing that was suitable for commit. And I never really came up with a good solution for the app-schema issue.

Long story short this one without funding or some other mandate probably won’t be fixed in the short term.

-Justin

On Tue, May 8, 2012 at 7:40 PM, Bertrand, Matthew <mbertrand@anonymised.com> wrote:

Hello,

I’ve noticed that WFS transactions (insert, update) have become slower as the number of layers in my GeoServer 2.1 instance has increased (it’s currently at a little over 4000 layers, most of them sharing one PostGIS store and the rest are geotiffs). On every WFS transaction, GeoServer seems to iterate over every layer in the catalog, I guess this is in order to generate XSD schema objects? Do you think it might be possible to avoid this somehow and get the information that it that needs to complete the transaction only from the one layer being edited?

-Matt


Live Security Virtual Conference
Exclusive live event will cover all the ways today’s security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.

On Wed, May 9, 2012 at 8:19 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

On Wed, May 9, 2012 at 3:40 PM, Justin Deoliveira <jdeolive@anonymised.com> wrote:

Unfortunately this is a known issue and even more unfortunately not one that will probably be solved soon. The way we manage schema objects for the purposes of parsing/encoding gml is a rather sad story. There are two main issues.

  1. There is no caching that occurs, which is the main problem here and is very problematic for large catalogs.

  2. We do scan the entire catalog when we really don’t have to. The only case where its necessary is for app-schema where the full scan occurs because you don’t know ahead of time for any one feature type what application schema namespaces it may reference. So a full scan is done to include all of them.

A while back I experimented on a branch to fix these issues, and had some success unfortunately it was pretty exploratory code and nothing that was suitable for commit. And I never really came up with a good solution for the app-schema issue.

Long story short this one without funding or some other mandate probably won’t be fixed in the short term.

Wondering… if all the extra work has to be done just for complex features, could we just take the shortcut if
no complex features are available on the server? Have a flag with some catalog listener updating it as
new layers are configured that keeps the information about presence of complex features up to date.
Justin thinking out loud here.

Yeah, as it stands now i think a separate path/behaviour for complex features will be necessary. Especially when it comes to building up the schema objects for gml encoding and parsing. Parsing (which is done for an insert transaction) is a bit tricky because we don’t know what type we are processing until we parse, although we could do a quick initial scan to find out.

Once we know the types, we can determine if any are complex, and if not optimize when building out the schema object for them.

Cheers
Andrea


Ing. Andrea Aime
GeoSolutions S.A.S.
Tech lead

Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy

phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.youtube.com/user/GeoSolutionsIT
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf


Justin Deoliveira
OpenGeo - http://opengeo.org
Enterprise support for open source geospatial.