[Geoserver-devel] Regionating algorithm / KML hierarchies

Hi,

I have been encouraged to write this email to anyone who is interested in the topic or who wants to stand up large KML hierarchies. It's written as an explanation / tips for how to get okay performance, but really I am hoping to get some fresh ideas and input on how we can do things better. So please share any thoughts you have on this topic.

Basic introduction: Regionating is about dividing information into tiles. The idea is that instead of bombarding the client with all the information at once we display the most important features first, then provide more as the client zooms in. We only return data that is relevant for the region where the client is looking. The effect is easily observed in Google Earth.

The regionating algorithm we are using is this one: http://geoserver.org/display/GEOS/Regionating+Improvement

But this can be pretty hard to grasp, so I'll make the assumption that people here are more familiar with relational databases, in which case the queries used to build the H2 database look something like this ("native sorting" strategy):

SELECT * FROM featuretype WHERE intersects( the_geom, <bounding box of tile> ) AND fid NOT IN (<ids of all features in parent tiles>) ORDER BY <chosen regionating attribute> LIMIT <features per tile>
This is not the complete picture. We have to run this query at least (size of dataset) / (features per tile) times, in reality a bit more because we only stop when the query returns an empty set. Also, in order to find <ids of all features in parent tiles> , we have to go to our H2 database and figure out everything that has already been accounted for in tiles further up.

But basically there are three really important factors:
1) We need to have a fast spatial index to limit the number of features to consider.
2) More placemarks per tile means the query is run less often
3) In cases where the data is heavily clustered (for example, where coordinates have been rounded) the spatial index becomes useless, so you want one on the regionating attribute as well.

If Postgis' spatial index is O(n) = Log (n) , then ideally I think this thing runs like n Log (n), and without the right indexes probably closer to n^3 ( n^2 to do the spatial query, which has to be done for n / x tiles). If you have more than a few hundred placemarks, there's obviously *huge* difference.

If you don't use "native sorting", perhaps because the backend can't sort on the attribute that you like, things also end up being n^3 and possibly a lot slower (by a constant). In those cases the sorting is done in Java, so for each tile much of the datastore is streamed to GeoServer and the results filtered.

All in all, here are the current recommendations for KML hierarchies, whether they are for GeoSearch or Google Earth:
1) Use a backend that supports queries, such as Postgis. You can use shp2psql to convert from a Shapefile to a SQL format supported by Postgis. Be sure to specify that you want a GIST (geospatial index) to be created, and provide the SRS. (-I and -s)
2) Make sure your database has a primary index (an auto-incrementing integer is fine) and a spatial index on the geometry column
3) Put an index on the column that you are going to sort the feature by. If you are using the size of the geometry, consider making an auxilliary column that contains the precalculated value and put an index on that. Note that GeoServer always sorts in descending order, so features you consider important should have a high value.
4) In GeoServer's feature type configuration, be sure to use "native-sorting" for the regionating strategy, and your chosen column as the regionating attribute.
5) KML Feature Limit should generally be set to 50. It's a balancing act between too much information per tile (Googlebot prefers document that are less than 1 megabyte) and a big hierarchy that takes long to build.

( Previously written here: http://geoserver.org/display/GEOS/GeoSearch+Module )

Thanks for reading, please do respond if you have any thoughts or questions.

--
Arne Kepp
OpenGeo - http://opengeo.org
Expert service straight from the developers