[Geoserver-devel] interesting performance pattern between udig trunk and 1.6.4

I have recently been building up graphs from WFS data; and have been experiencing some interesting performance characteristics - and I am wondering if they are expected ...

In order to build up a graph ...
1. I request the features in a region using a normal WFS 1.0.0 GetFeatures request.
2. For the first 30 seconds the server performance logs show that GeoServer is using up a full core; and postgis is using a small percentage of another core. During this time no data is being transmitted to the client.
3. At this point the client starts GeoServer starts sending data to the client; GeoServer remains using a full core; and postgis is using a small percentage of another core.

I went into the web.xml and confirmed that SPEED was set as the policy ... is this the case of needing to issue the postgis request twice? once to calculate the bounds and again to grab and transmit the data? If so can we avoid this step by assuming the bounds provided during the request "contain" the resulting data?

Jody

Jody Garnett ha scritto:

I have recently been building up graphs from WFS data; and have been experiencing some interesting performance characteristics - and I am wondering if they are expected ...

In order to build up a graph ...
1. I request the features in a region using a normal WFS 1.0.0 GetFeatures request.
2. For the first 30 seconds the server performance logs show that GeoServer is using up a full core; and postgis is using a small percentage of another core. During this time no data is being transmitted to the client.
3. At this point the client starts GeoServer starts sending data to the client; GeoServer remains using a full core; and postgis is using a small percentage of another core.

I went into the web.xml and confirmed that SPEED was set as the policy ... is this the case of needing to issue the postgis request twice? once to calculate the bounds and again to grab and transmit the data? If so can we avoid this step by assuming the bounds provided during the request "contain" the resulting data?

Uh, this is complex.
First off, in 1.6.x we actually do 3 queries. The first one is a count,
to make sure we don't return too many features. It's only needed
in case you have multiple Query elements (to respect the global max
features request) and I've optimized it out in 1.7.0-RC3, no luck
for 1.6.x, that series is dead.
The second one is about the bounds, and no, we cannot return the
request bounds, the bounds we return are supposed to be the
bounds of the feature collection. But we have a flag in WFS/content
that allows the result to avoid the bounds element altogether,
and it seems we're not honouring it anymore (can you open a jira issue about it?).

On 1.7.x I double checked that SPEED is used for the benchmarks, but
on 1.6.x it's not the first report I heard of that makes me thing
for some reason file strategy is used instead. Yet I've just tried
with my 1.6.x checkout and with SPEED configured my debugger stops only in SpeedStrategy...

It's also quite strange that you see GeoServer peg the CPU,
count and bounds should use lots of Postgis time instead.
Can you enable GEOTOOLS_DEVELOPER_LOGGING and see what's taking
so long as it goes?

Stupid question but... you do have a spatial index on that data,
right?

Cheers

--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

Andrea Aime wrote:

Jody Garnett ha scritto:

I have recently been building up graphs from WFS data; and have been experiencing some interesting performance characteristics - and I am wondering if they are expected ...

In order to build up a graph ...
1. I request the features in a region using a normal WFS 1.0.0 GetFeatures request.
2. For the first 30 seconds the server performance logs show that GeoServer is using up a full core; and postgis is using a small percentage of another core. During this time no data is being transmitted to the client.
3. At this point the client starts GeoServer starts sending data to the client; GeoServer remains using a full core; and postgis is using a small percentage of another core.

I went into the web.xml and confirmed that SPEED was set as the policy ... is this the case of needing to issue the postgis request twice? once to calculate the bounds and again to grab and transmit the data? If so can we avoid this step by assuming the bounds provided during the request "contain" the resulting data?

Uh, this is complex.
First off, in 1.6.x we actually do 3 queries. The first one is a count,
to make sure we don't return too many features. It's only needed
in case you have multiple Query elements (to respect the global max
features request) and I've optimized it out in 1.7.0-RC3, no luck
for 1.6.x, that series is dead.

:slight_smile:

The second one is about the bounds, and no, we cannot return the
request bounds, the bounds we return are supposed to be the
bounds of the feature collection. But we have a flag in WFS/content
that allows the result to avoid the bounds element altogether,
and it seems we're not honouring it anymore (can you open a jira issue about it?).

I am just leaving for the day; so I missed my chance to open a Jira.

On 1.7.x I double checked that SPEED is used for the benchmarks, but
on 1.6.x it's not the first report I heard of that makes me thing
for some reason file strategy is used instead. Yet I've just tried
with my 1.6.x checkout and with SPEED configured my debugger stops only in SpeedStrategy...

It's also quite strange that you see GeoServer peg the CPU, count and bounds should use lots of Postgis time instead.

It is strange; but the other developer here confirms it - geoserver was using a full core.

Can you enable GEOTOOLS_DEVELOPER_LOGGING and see what's taking so long as it goes?

Stupid question but... you do have a spatial index on that data, right?

not a stupid question; that is what was wrong yesterday :slight_smile:

Jody