[Geoserver-devel] Output size limitation in WFS queries? Experience returns in testing Geoserver with large datasets?

First, thanks for answers about WSDL stuff.
I'm currently testing Geoserver1.2RC1 for production needs. I have size limitation problem when I query my server. Here is what I've done. I set up some test datasets, with differents amount of data. The server is globally ok and most of the requests I sent passed successfully. I had some problems but they were mostly caused by the quality of data (shapefiles) that I imported into my PostGIS database (with shp2pgsql) After some data cleaning, everything seemed to work fine...

...until I tried to download a huge GML output (from my http://…:8080/geoserver/wfs/…)
I set up via the Geoserver web interface a WFS access for a really big dataset (219136 polygons) It reprensents the world wide coastline database I found here (http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html) I used "wget" to download the GML output of the Geoserver WFS and I made 2 tries. In the two cases, I optain the same size.

-rw-r--r-- 1 nicolas geotraceagri 21123072 mai 19 15:06 gshhs2.gml
-rw-r--r-- 1 nicolas geotraceagri 21123072 mai 19 14:37 gshhs.gml

Both of the 2 files end by

[...]
      <gshhs_postgis:lpoly_>0</gshhs_postgis:lpoly_>
      <gshhs_postgis:rpoly_>0</gshhs_postgis:rpoly_>
      <gshhs_postgis:length>0.007854</gshhs_postgis:length>
      <gshhs_postgis:gshhs_>115521</gshhs_postgis:gshhs_>
      <gshhs_postgis:gshhs_id>113328</gshhs_postgis:gshhs_id>
      <gshhs_postgis:the_geom>
        <gml:MultiLineString srsName="http://www.opengis.net/gml/srs/epsg.xml#4326&quot;&gt;
          <gml:lineStringMember>
            <gml:LineString>
              <gml:coordina

As you can see the dump is incomplete and I only got exactly 3600 polygons. No way to download more data from my Geoserver.
Is it a Geoserver limitation?

It'd be cool to have some feedbacks from other users who made tests with large datasets.
Dat's all folks for the moment :slight_smile:

                       Nicolas Vila

Nicolas Vila wrote:

Both of the 2 files end by

[...]
     <gshhs_postgis:lpoly_>0</gshhs_postgis:lpoly_>
     <gshhs_postgis:rpoly_>0</gshhs_postgis:rpoly_>
     <gshhs_postgis:length>0.007854</gshhs_postgis:length>
     <gshhs_postgis:gshhs_>115521</gshhs_postgis:gshhs_>
     <gshhs_postgis:gshhs_id>113328</gshhs_postgis:gshhs_id>
     <gshhs_postgis:the_geom>
       <gml:MultiLineString srsName="http://www.opengis.net/gml/srs/epsg.xml#4326&quot;&gt;
         <gml:lineStringMember>
           <gml:LineString>
             <gml:coordina

As you can see the dump is incomplete and I only got exactly 3600 polygons. No way to download more data from my Geoserver.
Is it a Geoserver limitation?

That looks really strange that it it cut off mid stream, how long did it take to reach this point (10 minuets?). When chris was playing with large data sets he talked about one test taking 15 minuets. That really looks like something it cutting you off.

Quoting Jody Garnett <jgarnett@anonymised.com>:

Nicolas Vila wrote:

> Both of the 2 files end by
>
> [...]
> <gshhs_postgis:lpoly_>0</gshhs_postgis:lpoly_>
> <gshhs_postgis:rpoly_>0</gshhs_postgis:rpoly_>
> <gshhs_postgis:length>0.007854</gshhs_postgis:length>
> <gshhs_postgis:gshhs_>115521</gshhs_postgis:gshhs_>
> <gshhs_postgis:gshhs_id>113328</gshhs_postgis:gshhs_id>
> <gshhs_postgis:the_geom>
> <gml:MultiLineString
> srsName="http://www.opengis.net/gml/srs/epsg.xml#4326&quot;&gt;
> <gml:lineStringMember>
> <gml:LineString>
> <gml:coordina
>
> As you can see the dump is incomplete and I only got exactly 3600
> polygons. No way to download more data from my Geoserver.
> Is it a Geoserver limitation?

That looks really strange that it it cut off mid stream, how long did
it
take to reach this point (10 minuets?). When chris was playing with
large data sets he talked about one test taking 15 minuets. That
really
looks like something it cutting you off.

Actually I've just been playing with wget too, and it looks to perform a
lot better than my tests, I did an 11 meg download in about 9 seconds
(locally of course). Unfortunately that's the largest data set I've
got at the moment, and it's a bitch for me to download a larger one,
though I'll try to grab that same one you did, looks like a good one to
test with.

I too find it very odd that it's cut off mid stream. And how do you
know that it's 3600 polygons both times? I would say that you could
adjust the maxFeatures (in Config-Server or in the services.xml file)
to a greater value, but it doesn't sound like that's the problem.

Is it possible for you to test with a utility other than wget (sorry, I
don't know of any other good ones)? To make sure it's not a limit with
wget? The errors with size I'm used to seeing with GeoServer are out
of memory ones, and this doesn't look like that, since it's producing
output. Is there anything in the logs to indicate a problem? And
perhaps try another very large dataset, see if you get cut off at the
same point.

I'm quite interested in this, and I'll do what I can to test it as well.
But I think GeoServer should be able to handle it, like I can't
imagine what is limiting it, unless it's like j2ee output streams or
sax production, but none of the answers that come to mind make sense to
me, especially being cut off mid stream like yours is.

best regards,

Chris

-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/

Here is what I get when I try to download the data. I took 1min22sec for 21,1Mbytes. You must consider that my dataset test is composed by 219136 polygons. It represents the world wide coastline database and if you wanna test it, you can get it at http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html
You can download the shapefile directly at this location.
I’ll made some other tests this week, with different configurations. I’ll post you the results. 11Mb in 9sec… wow, It seems you have a faster computer than me :slight_smile:
Regards

Nicolas

[nicolas@anonymised.com nicolas]$ wget [http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs](http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs)
--17:46:17-- [http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs](http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs)
=> GetFeature?typeName=gshhs_postgis:gshhs’ Résolution de localhost… 127.0.0.1 Connexion vers localhost[127.0.0.1]:8080…connecté. requête HTTP transmise, en attente de la réponse…200 OK Longueur: non spécifié [text/xml] [ <=> ] 21,123,072 531.94K/s 17:47:39 (477.65 KB/s) - « GetFeature?typeName=gshhs_postgis:gshhs » sauvegardé [21123072] `

cholmes@anonymised.com a écrit :

Here is what I get when I try to download the data. I took 1min22sec
for 21,1Mbytes. You must consider that my dataset test is composed by
219136
polygons.

Ok, turns out my testing was getting limited by exactly what I would
tell others to change. I was limiting my results to 10,000 features.
Am now testing my full database, and just downloaded about 300
megabytes (about 230,000 features of the roads of Los Angeles)
successfully. Took 10 minutes 31 seconds, and wasn't cut off at the
end. I'm downloading your mega shapefile at the moment, and will try
testing it later today, but I seem to not have the 27 meg limit (though
that could be proved wrong testing with the same dataset).

It represents the world wide coastline database and if you
wanna test it, you can get it at
http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html
You can download the shapefile directly at this location.

That is one large shapefile. And you just used shp2pgsql to get it in
the database?

I'll made some other tests this week, with different configurations.
I'll post you the results.

Excellent. If you could, just post them directly to the wiki at
http://docs.codehaus.org/display/GEOS/Performance+and+Scalability+Testing
You'll need to sign up for an account and then you can edit it directly.

11Mb in 9sec... wow, It seems you have a faster computer than me :slight_smile:

It may just be a faster servlet container, if you use Jetty or Resin you
get quite a bit more speed than tomcat. Jetty is built in to the
geoserver source download, see
http://docs.codehaus.org/display/GEOS/Running+Embedded+Jetty

best regards,

Chris

Regards

                     Nicolas

[nicolas@anonymised.com nicolas]$ wget

http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs

--17:46:17--

http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs

          => `GetFeature?typeName=gshhs_postgis:gshhs'
Résolution de localhost... 127.0.0.1
Connexion vers localhost[127.0.0.1]:8080...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: non spécifié [text/xml]
   [
<=> ] 21,123,072 531.94K/s
17:47:39 (477.65 KB/s) - « GetFeature?typeName=gshhs_postgis:gshhs »
sauvegardé [21123072]

cholmes@anonymised.com a écrit :

>Quoting Jody Garnett <jgarnett@anonymised.com>:
>
>
>
>>Nicolas Vila wrote:
>>
>>
>>
>>>Both of the 2 files end by
>>>
>>>[...]
>>> <gshhs_postgis:lpoly_>0</gshhs_postgis:lpoly_>
>>> <gshhs_postgis:rpoly_>0</gshhs_postgis:rpoly_>
>>> <gshhs_postgis:length>0.007854</gshhs_postgis:length>
>>> <gshhs_postgis:gshhs_>115521</gshhs_postgis:gshhs_>
>>> <gshhs_postgis:gshhs_id>113328</gshhs_postgis:gshhs_id>
>>> <gshhs_postgis:the_geom>
>>> <gml:MultiLineString
>>>srsName="http://www.opengis.net/gml/srs/epsg.xml#4326&quot;&gt;
>>> <gml:lineStringMember>
>>> <gml:LineString>
>>> <gml:coordina
>>>
>>>As you can see the dump is incomplete and I only got exactly 3600
>>>polygons. No way to download more data from my Geoserver.
>>>Is it a Geoserver limitation?
>>>
>>>
>>That looks really strange that it it cut off mid stream, how long
did
>>it
>>take to reach this point (10 minuets?). When chris was playing
with
>>large data sets he talked about one test taking 15 minuets. That
>>really
>>looks like something it cutting you off.
>>
>>
>Actually I've just been playing with wget too, and it looks to
perform a
>lot better than my tests, I did an 11 meg download in about 9
seconds
>(locally of course). Unfortunately that's the largest data set I've
>got at the moment, and it's a bitch for me to download a larger one,
>though I'll try to grab that same one you did, looks like a good one
to
>test with.
>
>I too find it very odd that it's cut off mid stream. And how do you
>know that it's 3600 polygons both times? I would say that you could
>adjust the maxFeatures (in Config-Server or in the services.xml
file)
>to a greater value, but it doesn't sound like that's the problem.
>
>Is it possible for you to test with a utility other than wget
(sorry, I
>don't know of any other good ones)? To make sure it's not a limit
with
>wget? The errors with size I'm used to seeing with GeoServer are
out
>of memory ones, and this doesn't look like that, since it's
producing
>output. Is there anything in the logs to indicate a problem? And
>perhaps try another very large dataset, see if you get cut off at
the
>same point.
>
>I'm quite interested in this, and I'll do what I can to test it as
well.
> But I think GeoServer should be able to handle it, like I can't
>imagine what is limiting it, unless it's like j2ee output streams or
>sax production, but none of the answers that come to mind make sense
to
>me, especially being cut off mid stream like yours is.
>
>best regards,
>
>Chris
>
>
>
>>
>>
>>-------------------------------------------------------
>>This SF.Net email is sponsored by: SourceForge.net Broadband
>>Sign-up now for SourceForge Broadband and get the fastest
>>6.0/768 connection for only $19.95/mo for the first 3 months!
>>http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
>>_______________________________________________
>>Geoserver-devel mailing list
>>Geoserver-devel@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/geoserver-devel
>>
>>
>>
>
>
>
>
>----------------------------------------------------------
>This mail sent through IMP: https://webmail.limegroup.com/
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: SourceForge.net Broadband
>Sign-up now for SourceForge Broadband and get the fastest
>6.0/768 connection for only $19.95/mo for the first 3 months!
>http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
>_______________________________________________
>Geoserver-devel mailing list
>Geoserver-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/geoserver-devel
>
>
>

----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/