Hi all,
Here goes a big message with some thoughts about the usability of my WFS/WMS server, sorry it is a little bit long.
I have been wondering for months already what is the way a big database should be served as a WFS. I mean a database with more than 50 million features all around the world.
In geoserver you can configure the maximum number of features that you want to allow to return at once (in my case 1000), but:
-This number is not advertised anywhere on the capabilities so there is no way I can figure out how many features the WFS server will serve me at maximum.
-I checked on the logs from geoserver and seems that geoserver is not doing a LIMIT 1000 so I suppose is himself who is discarding features. Is this ok? Shouldn't geoserver limit the number of features in the SQL statement?
-In WMS there is no such a limitation, so in my case someone asking my featureType in a world view will make geoserver download the 50 million records and generate a map with it. That of course crash my server.
In the other hand I am sure a client does not want to get back 50 million features. The problem is that most WFS client implementations I have seen, like udig, if you add this featureType to your map it will start downloading everything it can, and unfortunately most of the time the people is viewing at the world. But even if clients would like to be more smarter, they have serious problems with actual WFS protocol. There is no way they can ask for a count to be prepared on how many records are gonna come, neither they can page.
This actually is making me think that I am going to have to write a WFS "reflector" for WFS (like Brent did for KML) that takes care of things like:
-If I get the typical WFS request to my big FeatureType without any filter I then redirect the WFS request to another, much smaller FeatureType with some points distributed around the world so that people can get an idea of what is behind this service. If I just allow geoserver take the 1st 1000 (if geoserver pass the LIMIT to the db) all my points will finish at the bottom (the indexes work in a certain way ) and the user will not see much.
-I can be more dramatic and return an error saying that I do not allow requests without filters on this specific attribute with at least 3 letters.
-I have different tables that I call spatial caches. They group, or cluster, the data into cells and give overviews of how much data is behind it. In a web interface I have created, depending on the zoom level, I use these caches or go straight to the 50mill. featureType. I can maybe do the same depending on the estimation of how many features a query will return.
-The other possibility is to deny a query depending on the bounding box of the request, If I consider it too big I return an error.
I believe most of this issues should be tackled on the WFS protocol itself and/or in Catalog services where metadata about the service can be found and help you decide how to use a service.
I have heard the idea of FeatureType Catalogs where these things can be consider (you can register queries that you will accept), but the WFS/WMS services will at the end have to implement the mechanisms to make them real and I think they should also advertise them.
Somehow it seems to me that the actual WFS protocol and the client/server implementations are not prepared for big databases where these issues appear. Do you know of any WFS service that has a lot of features behind (millions)?
Best regards,
Javier.