[Geoserver-devel] Modifications for faster WFS transactions w/large data catalogs

Hi,

I was running into an issue with editing features via WFS in Geoserver with a catalog of thousands of layers. It seems as though the schema for every layer in the workspace has to be built and loaded into memory first, which slow things down significantly (at least for the initial edit) if there are many layers. So I tried to make some modifications to the code which would build only the schema of the layer to be edited, assuming a new boolean global setting "dynamicFeatureTypeSchema" is set to true.

The comparison of this code to the master branch is here: https://github.com/cga-harvard/geoserver/compare/gsmaster_fastwfs?expand=1

When "dynamicFeatureTypeSchema" is set to true, there is one unit test that fails, at this line: https://github.com/geoserver/geoserver/blob/master/src/wfs/src/test/java/org/geoserver/wfs/v1_1/GetFeatureTest.java#L467

I'd love to get some input on what you think of this approach and whether there might be a better solution.

-Matt

Hi Matt,
Justin will have to chime in to give you more information on this one, but the
way I understand things, we should be building the large schema once and
caching it, because building it request by request is expensive.

However it seems that in your case the schema is not cached, but rebuilt
for every request… which is new to me

Cheers
Andrea

···

On Mon, Jun 9, 2014 at 3:42 PM, Matt Bertrand <mbertrand@anonymised.com> wrote:

Hi,

I was running into an issue with editing features via WFS in Geoserver
with a catalog of thousands of layers. It seems as though the schema
for every layer in the workspace has to be built and loaded into memory
first, which slow things down significantly (at least for the initial
edit) if there are many layers. So I tried to make some modifications
to the code which would build only the schema of the layer to be edited,
assuming a new boolean global setting “dynamicFeatureTypeSchema” is set
to true.

The comparison of this code to the master branch is here:
https://github.com/cga-harvard/geoserver/compare/gsmaster_fastwfs?expand=1

When “dynamicFeatureTypeSchema” is set to true, there is one unit test
that fails, at this line:
https://github.com/geoserver/geoserver/blob/master/src/wfs/src/test/java/org/geoserver/wfs/v1_1/GetFeatureTest.java#L467

I’d love to get some input on what you think of this approach and
whether there might be a better solution.

-Matt


HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://www.hpccsystems.com


Geoserver-devel mailing list
Geoserver-devel@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

==

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

==

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


The schema is being built once and then cached - for awhile. But the cache doesn't seem to last very long, maybe because new layers are being added on a pretty regular basis? I'm using Geoserver in conjunction with GeoNode, so users are constantly uploading new data (there are over 12K layers now). When a user tries to edit/add features to a layer, if the schema needs to be rebuilt again, it usually takes at least a few minutes for that to happen, and the system appears broken because it doesn't respond promptly. So I wrote these modifications to address this problem by loading on request only the schema of the layer being edited.

-Matt

On 06/10/2014 04:37 AM, Andrea Aime wrote:

Hi Matt,
Justin will have to chime in to give you more information on this one, but the
way I understand things, we should be building the large schema once and
caching it, because building it request by request is expensive.

However it seems that in your case the schema is not cached, but rebuilt
for every request... which is new to me

Cheers
Andrea

On Mon, Jun 9, 2014 at 3:42 PM, Matt Bertrand <mbertrand@anonymised.com <mailto:mbertrand@anonymised.com>> wrote:

    Hi,

    I was running into an issue with editing features via WFS in Geoserver
    with a catalog of thousands of layers. It seems as though the schema
    for every layer in the workspace has to be built and loaded into
    memory
    first, which slow things down significantly (at least for the initial
    edit) if there are many layers. So I tried to make some modifications
    to the code which would build only the schema of the layer to be
    edited,
    assuming a new boolean global setting "dynamicFeatureTypeSchema"
    is set
    to true.

    The comparison of this code to the master branch is here:
    https://github.com/cga-harvard/geoserver/compare/gsmaster_fastwfs?expand=1
    <https://urldefense.proofpoint.com/v1/url?u=https://github.com/cga-harvard/geoserver/compare/gsmaster_fastwfs?expand%3D1&k=AjZjj3dyY74kKL92lieHqQ%3D%3D &r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D &m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D &s=1d763aef28d233deff781edb9f8ef39fd040c2c1997dc9eb0b398e2a0cd5d021&gt;

    When "dynamicFeatureTypeSchema" is set to true, there is one unit test
    that fails, at this line:
    https://github.com/geoserver/geoserver/blob/master/src/wfs/src/test/java/org/geoserver/wfs/v1_1/GetFeatureTest.java#L467
    <https://urldefense.proofpoint.com/v1/url?u=https://github.com/geoserver/geoserver/blob/master/src/wfs/src/test/java/org/geoserver/wfs/v1_1/GetFeatureTest.java%23L467&k=AjZjj3dyY74kKL92lieHqQ%3D%3D &r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D &m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D &s=cfca207a44a2a370998bc41c6bb71d5552db0ab552b9879a9a6c8d2f430c0974&gt;

    I'd love to get some input on what you think of this approach and
    whether there might be a better solution.

    -Matt

    ------------------------------------------------------------------------------
    HPCC Systems Open Source Big Data Platform from LexisNexis Risk
    Solutions
    Find What Matters Most in Your Big Data with HPCC Systems
    Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
    Leverages Graph Analysis for Fast Processing & Easy Data Exploration
    http://www.hpccsystems.com
    <https://urldefense.proofpoint.com/v1/url?u=http://www.hpccsystems.com&k=AjZjj3dyY74kKL92lieHqQ%3D%3D &r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D &m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D &s=141907d1af2831f55a276e050fa22780550936cefc3dd46c221ffa589d28f922&gt;
    _______________________________________________
    Geoserver-devel mailing list
    Geoserver-devel@lists.sourceforge.net
    <mailto:Geoserver-devel@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/geoserver-devel
    <https://urldefense.proofpoint.com/v1/url?u=https://lists.sourceforge.net/lists/listinfo/geoserver-devel&k=AjZjj3dyY74kKL92lieHqQ%3D%3D &r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D &m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D &s=7ff38f5228bde8f9c19f29b8fb0af417dec3f2dc3ede584fdffc9f9b0fcf4dca&gt;

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 <https://urldefense.proofpoint.com/v1/url?u=http://goo.gl/NWWaa2&amp;k=AjZjj3dyY74kKL92lieHqQ%3D%3D &amp;r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D &amp;m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D &amp;s=2d94f286db92593e72963bc7a884f2eb20b31949b13fedd927e806fb31e1c11e&gt; for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it <https://urldefense.proofpoint.com/v1/url?u=http://www.geo-solutions.it&k=AjZjj3dyY74kKL92lieHqQ%3D%3D%0A&r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D%0A&m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D%0A&s=3fe30fab2deb30fbb1bc33d10c774812137797ac1741c2a509d1c36aa566fe17>
http://twitter.com/geosolutions_it <https://urldefense.proofpoint.com/v1/url?u=http://twitter.com/geosolutions_it&amp;k=AjZjj3dyY74kKL92lieHqQ%3D%3D &amp;r=E51gnZ%2BcyXbMGQZxn%2FD1gw7E4%2FG%2Bn9A8lGzpPjSUKD4%3D &amp;m=OT%2B%2Bq4Rw%2F8JRsvmqJLI8PEPRy6YSdfVOboyX7FtnehM%3D &amp;s=ede60cd86dba681f8320820f45771e27e9222c2a1f3272da5fefb4b10a948126&gt;

-------------------------------------------------------

On Tue, Jun 10, 2014 at 1:21 PM, Matt Bertrand <mbertrand@anonymised.com>
wrote:

The schema is being built once and then cached - for awhile. But the
cache doesn't seem to last very long, maybe because new layers are being
added on a pretty regular basis?

Right, it's getting dropped every time we add a new vector layer.

I'm using Geoserver in conjunction with GeoNode, so users are constantly
uploading new data (there are over 12K layers now). When a user tries to
edit/add features to a layer, if the schema needs to be rebuilt again, it
usually takes at least a few minutes for that to happen, and the system
appears broken because it doesn't respond promptly. So I wrote these
modifications to address this problem by loading on request only the schema
of the layer being edited.

Yes, understandable. Wondering how much it will affect the "normal"
GetFeature cycle though

Anyways, waiting for Justin to chime in, he's the expert on this

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------