[Geoserver-users] Big databases behind a WFS/WMS server

Hi all,

Here goes a big message with some thoughts about the usability of my WFS/WMS server, sorry it is a little bit long.

I have been wondering for months already what is the way a big database should be served as a WFS. I mean a database with more than 50 million features all around the world.

In geoserver you can configure the maximum number of features that you want to allow to return at once (in my case 1000), but:

-This number is not advertised anywhere on the capabilities so there is no way I can figure out how many features the WFS server will serve me at maximum.

-I checked on the logs from geoserver and seems that geoserver is not doing a LIMIT 1000 so I suppose is himself who is discarding features. Is this ok? Shouldn't geoserver limit the number of features in the SQL statement?

-In WMS there is no such a limitation, so in my case someone asking my featureType in a world view will make geoserver download the 50 million records and generate a map with it. That of course crash my server.

In the other hand I am sure a client does not want to get back 50 million features. The problem is that most WFS client implementations I have seen, like udig, if you add this featureType to your map it will start downloading everything it can, and unfortunately most of the time the people is viewing at the world. But even if clients would like to be more smarter, they have serious problems with actual WFS protocol. There is no way they can ask for a count to be prepared on how many records are gonna come, neither they can page.

This actually is making me think that I am going to have to write a WFS "reflector" for WFS (like Brent did for KML) that takes care of things like:
  -If I get the typical WFS request to my big FeatureType without any filter I then redirect the WFS request to another, much smaller FeatureType with some points distributed around the world so that people can get an idea of what is behind this service. If I just allow geoserver take the 1st 1000 (if geoserver pass the LIMIT to the db) all my points will finish at the bottom (the indexes work in a certain way :wink: ) and the user will not see much.

  -I can be more dramatic and return an error saying that I do not allow requests without filters on this specific attribute with at least 3 letters.

  -I have different tables that I call spatial caches. They group, or cluster, the data into cells and give overviews of how much data is behind it. In a web interface I have created, depending on the zoom level, I use these caches or go straight to the 50mill. featureType. I can maybe do the same depending on the estimation of how many features a query will return.

  -The other possibility is to deny a query depending on the bounding box of the request, If I consider it too big I return an error.

I believe most of this issues should be tackled on the WFS protocol itself and/or in Catalog services where metadata about the service can be found and help you decide how to use a service.

I have heard the idea of FeatureType Catalogs where these things can be consider (you can register queries that you will accept), but the WFS/WMS services will at the end have to implement the mechanisms to make them real and I think they should also advertise them.

Somehow it seems to me that the actual WFS protocol and the client/server implementations are not prepared for big databases where these issues appear. Do you know of any WFS service that has a lot of features behind (millions)?

Best regards,

Javier.

I will take a stab at one of the issues first, and answer more later:

Rendering the 50 million features over WMS:
This is an issue and is something we all have to deal with. You can do various things such as select which features are rendered using rules and filters in the SLD document. You can also generalize the datasets (lower the point count and feature count) so they can be rendered easier, especially when zoomed out.
For the sigma site TOPP did (sigma.openplans.org) we had to generalize the roads dataset about 4 times for different zoom levels. So the farther out you zoomed, the more generalized (simplified) the roads were. We also had it only render the interstate highways when zoomed out. We did all of this in the SLD document.

So you will have to take steps to render a minimal amount of data.

Brent Owens
(The Open Planning Project)

Javier de la Torre wrote:

Hi all,

Here goes a big message with some thoughts about the usability of my WFS/WMS server, sorry it is a little bit long.

I have been wondering for months already what is the way a big database should be served as a WFS. I mean a database with more than 50 million features all around the world.

In geoserver you can configure the maximum number of features that you want to allow to return at once (in my case 1000), but:

-This number is not advertised anywhere on the capabilities so there is no way I can figure out how many features the WFS server will serve me at maximum.

-I checked on the logs from geoserver and seems that geoserver is not doing a LIMIT 1000 so I suppose is himself who is discarding features. Is this ok? Shouldn't geoserver limit the number of features in the SQL statement?

-In WMS there is no such a limitation, so in my case someone asking my featureType in a world view will make geoserver download the 50 million records and generate a map with it. That of course crash my server.

In the other hand I am sure a client does not want to get back 50 million features. The problem is that most WFS client implementations I have seen, like udig, if you add this featureType to your map it will start downloading everything it can, and unfortunately most of the time the people is viewing at the world. But even if clients would like to be more smarter, they have serious problems with actual WFS protocol. There is no way they can ask for a count to be prepared on how many records are gonna come, neither they can page.

This actually is making me think that I am going to have to write a WFS "reflector" for WFS (like Brent did for KML) that takes care of things like:
  -If I get the typical WFS request to my big FeatureType without any filter I then redirect the WFS request to another, much smaller FeatureType with some points distributed around the world so that people can get an idea of what is behind this service. If I just allow geoserver take the 1st 1000 (if geoserver pass the LIMIT to the db) all my points will finish at the bottom (the indexes work in a certain way :wink: ) and the user will not see much.

  -I can be more dramatic and return an error saying that I do not allow requests without filters on this specific attribute with at least 3 letters.

  -I have different tables that I call spatial caches. They group, or cluster, the data into cells and give overviews of how much data is behind it. In a web interface I have created, depending on the zoom level, I use these caches or go straight to the 50mill. featureType. I can maybe do the same depending on the estimation of how many features a query will return.

  -The other possibility is to deny a query depending on the bounding box of the request, If I consider it too big I return an error.

I believe most of this issues should be tackled on the WFS protocol itself and/or in Catalog services where metadata about the service can be found and help you decide how to use a service.

I have heard the idea of FeatureType Catalogs where these things can be consider (you can register queries that you will accept), but the WFS/WMS services will at the end have to implement the mechanisms to make them real and I think they should also advertise them.

Somehow it seems to me that the actual WFS protocol and the client/ server implementations are not prepared for big databases where these issues appear. Do you know of any WFS service that has a lot of features behind (millions)?

Best regards,

Javier.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Right Brent,

This is what I am doing in my interface. I suppose is not that normal that people send its own SLD to a WMS server, but if they do there is no way to prevent geoserver to crash the server right?

Javier.

On 19/07/2006, at 21:19, Brent Owens wrote:

I will take a stab at one of the issues first, and answer more later:

Rendering the 50 million features over WMS:
This is an issue and is something we all have to deal with. You can do various things such as select which features are rendered using rules and filters in the SLD document. You can also generalize the datasets (lower the point count and feature count) so they can be rendered easier, especially when zoomed out.
For the sigma site TOPP did (sigma.openplans.org) we had to generalize the roads dataset about 4 times for different zoom levels. So the farther out you zoomed, the more generalized (simplified) the roads were. We also had it only render the interstate highways when zoomed out. We did all of this in the SLD document.

So you will have to take steps to render a minimal amount of data.

Brent Owens
(The Open Planning Project)

Javier de la Torre wrote:

Hi all,

Here goes a big message with some thoughts about the usability of my WFS/WMS server, sorry it is a little bit long.

I have been wondering for months already what is the way a big database should be served as a WFS. I mean a database with more than 50 million features all around the world.

In geoserver you can configure the maximum number of features that you want to allow to return at once (in my case 1000), but:

-This number is not advertised anywhere on the capabilities so there is no way I can figure out how many features the WFS server will serve me at maximum.

-I checked on the logs from geoserver and seems that geoserver is not doing a LIMIT 1000 so I suppose is himself who is discarding features. Is this ok? Shouldn't geoserver limit the number of features in the SQL statement?

-In WMS there is no such a limitation, so in my case someone asking my featureType in a world view will make geoserver download the 50 million records and generate a map with it. That of course crash my server.

In the other hand I am sure a client does not want to get back 50 million features. The problem is that most WFS client implementations I have seen, like udig, if you add this featureType to your map it will start downloading everything it can, and unfortunately most of the time the people is viewing at the world. But even if clients would like to be more smarter, they have serious problems with actual WFS protocol. There is no way they can ask for a count to be prepared on how many records are gonna come, neither they can page.

This actually is making me think that I am going to have to write a WFS "reflector" for WFS (like Brent did for KML) that takes care of things like:
  -If I get the typical WFS request to my big FeatureType without any filter I then redirect the WFS request to another, much smaller FeatureType with some points distributed around the world so that people can get an idea of what is behind this service. If I just allow geoserver take the 1st 1000 (if geoserver pass the LIMIT to the db) all my points will finish at the bottom (the indexes work in a certain way :wink: ) and the user will not see much.

  -I can be more dramatic and return an error saying that I do not allow requests without filters on this specific attribute with at least 3 letters.

  -I have different tables that I call spatial caches. They group, or cluster, the data into cells and give overviews of how much data is behind it. In a web interface I have created, depending on the zoom level, I use these caches or go straight to the 50mill. featureType. I can maybe do the same depending on the estimation of how many features a query will return.

  -The other possibility is to deny a query depending on the bounding box of the request, If I consider it too big I return an error.

I believe most of this issues should be tackled on the WFS protocol itself and/or in Catalog services where metadata about the service can be found and help you decide how to use a service.

I have heard the idea of FeatureType Catalogs where these things can be consider (you can register queries that you will accept), but the WFS/WMS services will at the end have to implement the mechanisms to make them real and I think they should also advertise them.

Somehow it seems to me that the actual WFS protocol and the client/ server implementations are not prepared for big databases where these issues appear. Do you know of any WFS service that has a lot of features behind (millions)?

Best regards,

Javier.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Brent,

Any chance someone could write up what you guys did with the SLD and
provide the source?

One of the most frustrating bits in the OGC space is a lack of some
"cookbook" style examples for the "edge specs" (WCS, SLD and Filter).
For example I had no idea you could limit features in a WM* request
except by sending some SQL down the wire.

If there are some decent example sites out there for edge specs I
would love to know about it.

Adam Hill
(WorldWind Community Developer)

Adam Hill wrote:

Brent,

Any chance someone could write up what you guys did with the SLD and
provide the source?

We're hoping to do a more official release of sigma relatively soon, with some write-ups. But until then, there's a bunch of sld files that we use at:

http://sigma.openplans.org:8080/geoserver/data/styles/

This page: http://docs.codehaus.org/display/GEOSDOC/Loading+TIGER+roads+major shows what we did with the generalization.

One of the most frustrating bits in the OGC space is a lack of some
"cookbook" style examples for the "edge specs" (WCS, SLD and Filter).
For example I had no idea you could limit features in a WM* request
except by sending some SQL down the wire.

If there are some decent example sites out there for edge specs I
would love to know about it.

We're slowly putting up more generic OGC content. Ideally we could port some of it to a more generic site. We have pretty extensive SLD info: http://docs.codehaus.org/display/GEOSDOC/SLD+Explanations+and+Samples (though it could be arranged a bit better).

best regards,

Chris

Adam Hill
(WorldWind Community Developer)

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

!DSPAM:1003,44be8c0f28101527717022!

--
Chris Holmes
The Open Planning Project
http://topp.openplans.org

Hi Adam,

As Chris pointed out, you can see the SLD's we are using for Sigma here:
http://sigma.openplans.org:8080/geoserver/data/styles/

Some of the ones to point out are
gnis_pop
gnis_pop_ol
major_roads
roads

You can also read how Sigma was made with MapBuilder, and WFS here:
http://docs.codehaus.org/display/MAP/Building+A+Web+App

Adding more examples is definitely what we want to do. Because examples are more fun than reading the spec =)

Brent Owens
(The Open Planning Project)

Adam Hill wrote:

Brent,

Any chance someone could write up what you guys did with the SLD and
provide the source?

One of the most frustrating bits in the OGC space is a lack of some
"cookbook" style examples for the "edge specs" (WCS, SLD and Filter).
For example I had no idea you could limit features in a WM* request
except by sending some SQL down the wire.

If there are some decent example sites out there for edge specs I
would love to know about it.

Adam Hill
(WorldWind Community Developer)

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users