[Geoserver-devel] Output size limitation in WFS queries? Experience returns in testing Geoserver with large datasets?

Ok, I've been playing with this dataset this afternoon. For a second I
thought I reproduced your results, as I had it cutting off right at
about 21 megabytes, and when I did a tail I got half a line string.
But then I realized that I was doing the tail on a result that I had
cancelled midstream, so it was just my mistake. The ones that were
capped at 21 megs both had exactly 10,000 features, which is the
default max number of features set in the server. So check to make
sure you don't have that set, as the gshhs dataset of 10,000 features
is right about 21 megs.

That said I did have another problem with the dataset, an old error that
I wrote to jts and postgis lists about a year ago, and I guess it
hasn't been fixed. If a very small value is in the WKT then it causes
geoserver to mess up, as the jts parser can't handle scientific
notation, which Postgis returns for values less than e-4 or e-5 or so.
The gshhs_ for the row that messed me up is 109233 (gid for me was
109232), and 109234 also has a really small value. This will cause an
error of just being cut off mid stride, as the default strategy for
returning features is 'speed', which is technically wrong from a spec
perspective as it does not always successfully report errors. The
error should appear in the logs, however, with a level of FINE for
sure. So try setting your logging levels and see if you get any errors
to show up there. You may also try the FILE response strategy, this is
changed in WEB-INF/web.xml. This will first write the response to a
temp file, and thus will properly report any error (but is obviously
slower since the file needs to be written before being sent to the
client, and since it generally does not mess up we use it by default).

Actually it looks like the FILE strategy isn't working quite right,
though actually it'll return the proper error to you (it just won't
return right if there is no error). But check your logs, set to FINE,
and you'll see if your error is a parsing one. That dataset seems to
have a decent number of really small wkt values, I've been going
through and removing them, I've got:

dchhs_ = 111785, 111786, 115522, 117560, 129701, 130901

There are more, I just haven't gotten to them yet.

I just emailed the JTS list about the parser error, and we could also
look into using WKB instead of WKT, which would fix the problem.

best regards,

Chris

----- Original Message -----
From: "Nicolas Vila" <n.vila@anonymised.com>
To: <cholmes@anonymised.com>
Cc: "Jody Garnett" <jgarnett@anonymised.com>;
<geoserver-devel@lists.sourceforge.net>
Sent: Wednesday, May 19, 2004 11:41 PM
Subject: Re: [Geoserver-devel] Output size limitation in WFS queries?
Experience returns in testing Geoserver with large datasets?

Here is what I get when I try to download the data. I took 1min22sec for
21,1Mbytes. You must consider that my dataset test is composed by 219136
polygons. It represents the world wide coastline database and if you
wanna test it, you can get it at
http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html
You can download the shapefile directly at this location.
I'll made some other tests this week, with different configurations.
I'll post you the results. 11Mb in 9sec... wow, It seems you have a
faster computer than me :slight_smile:
Regards

                     Nicolas

[nicolas@anonymised.com nicolas]$ wget
http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs
--17:46:17--
http://localhost:8080/geoserver/wfs/GetFeature?typeName=gshhs_postgis:gshhs
          => `GetFeature?typeName=gshhs_postgis:gshhs'
Résolution de localhost... 127.0.0.1
Connexion vers localhost[127.0.0.1]:8080...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: non spécifié [text/xml]
   [
<=> ] 21,123,072 531.94K/s
17:47:39 (477.65 KB/s) - « GetFeature?typeName=gshhs_postgis:gshhs »
sauvegardé [21123072]

cholmes@anonymised.com a écrit :

Quoting Jody Garnett <jgarnett@anonymised.com>:

Nicolas Vila wrote:

Both of the 2 files end by

[...]
    <gshhs_postgis:lpoly_>0</gshhs_postgis:lpoly_>
    <gshhs_postgis:rpoly_>0</gshhs_postgis:rpoly_>
    <gshhs_postgis:length>0.007854</gshhs_postgis:length>
    <gshhs_postgis:gshhs_>115521</gshhs_postgis:gshhs_>
    <gshhs_postgis:gshhs_id>113328</gshhs_postgis:gshhs_id>
    <gshhs_postgis:the_geom>
      <gml:MultiLineString
srsName="http://www.opengis.net/gml/srs/epsg.xml#4326&quot;&gt;
        <gml:lineStringMember>
          <gml:LineString>
            <gml:coordina

As you can see the dump is incomplete and I only got exactly 3600
polygons. No way to download more data from my Geoserver.
Is it a Geoserver limitation?

That looks really strange that it it cut off mid stream, how long did
it
take to reach this point (10 minuets?). When chris was playing with
large data sets he talked about one test taking 15 minuets. That
really
looks like something it cutting you off.

Actually I've just been playing with wget too, and it looks to perform

a

lot better than my tests, I did an 11 meg download in about 9 seconds
(locally of course). Unfortunately that's the largest data set I've
got at the moment, and it's a bitch for me to download a larger one,
though I'll try to grab that same one you did, looks like a good one to
test with.

I too find it very odd that it's cut off mid stream. And how do you
know that it's 3600 polygons both times? I would say that you could
adjust the maxFeatures (in Config-Server or in the services.xml file)
to a greater value, but it doesn't sound like that's the problem.

Is it possible for you to test with a utility other than wget (sorry, I
don't know of any other good ones)? To make sure it's not a limit with
wget? The errors with size I'm used to seeing with GeoServer are out
of memory ones, and this doesn't look like that, since it's producing
output. Is there anything in the logs to indicate a problem? And
perhaps try another very large dataset, see if you get cut off at the
same point.

I'm quite interested in this, and I'll do what I can to test it as

well.

But I think GeoServer should be able to handle it, like I can't
imagine what is limiting it, unless it's like j2ee output streams or
sax production, but none of the answers that come to mind make sense to
me, especially being cut off mid stream like yours is.

best regards,

Chris

----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/

You got interesting results with the test I proposed you. Actually, I’m moving my Geoserver on another computer so I didn’t have set up the config files yet. I’ll modify the max number of features that can be handled by Geoserver. Whe I finish, I’ll tell you if I get the same results. Thank you for spending a bit of your time with this dataset and apparently, it wasn’t useless.
regards.

Nicolas

cholmes@anonymised.com a écrit :