[Geoserver-users] large datastore extents

Great, glad to see what I thought should be DEFAULT is also others take on dealing with setting/getting extents within GeoServer.

Now, there’s the issue of changing extents. In my opinion this should be opt-in.
In the short term could let users set the bounding box as ‘dynamic’, meaning it’d
be recomputed each time.
Agree, although opting-in to ‘dynamic’ bbox creation mode should really only be an option to the user if the feature count (table row count) is a sensible size and running the extent() function on the table wouldn’t lead to long wait times for the end user. The user could easily be guided about what choice to make at the ‘opt-in’ stage by offering the user an ‘estimated’ table row count printed somewhere in the FeatureType editor. For example;

SELECT reltuples FROM pg_class r WHERE relkind = 'r' AND relname = 'table_name';

This is an estimate, as of last vacuum task being run on the DB and would give the user a enough feel for table row count to make a decision.

In the longer term I’d like to see it hooked up to our transactions, so that
inserts/updates/deletes done through GeoServer will cause an automatic
update to the bounds.
Agree.

Cheers,
Simon

PS - thanks for setting up the GeoServer blog, good idea, rss feed has been firmly added to my reading list (not that i find anything wrong with the blogs design, it’s just easier to consume that way).

On 06/03/07, Chris Holmes < cholmes@anonymised.com> wrote:

Ok, my take is this:

The reason we added the ‘generate’ button was for exactly this - so that
users who had super big tables that would take a long time to compute
could just enter it themselves.

But after that generate call I believe we should just be using that for
the bounds. It strikes me as extremely bad that we’re calling getBounds
on the featureType for any WMS capabilities document, since the whole
point of the featureType page is to set things up for the capability
documents.

Now, there’s the issue of changing extents. In my opinion this should
be opt-in. In the short term could let users set the bounding box as
‘dynamic’, meaning it’d be recomputed each time.

In the longer term I’d like to see it hooked up to our transactions, so
that inserts/updates/deletes done through GeoServer will cause an
automatic update to the bounds.

But for right now I think we should fix the WMS capabilities document to
just use the generated value, instead of recomputing each time.

Chris

Simon Abele wrote:

Hi all,

Ok, i’ll try and address each of the suggestions made in the order i
received them; with one exception, and apologies in advance for length;

– Andrea, Brent

Do you still want me to submit a JIRA issue (or is this -
http://jira.codehaus.org/browse/GEOS-955 - open and closed issue
reported by Brent yesterday the same thing?

The comment for this issue, suggests this is already done/fixed; " It
grabs the latLon bounding box that is cached and loaded from the DTO.".
So is this code/behaviour just not in or working in the Geoserver
version we’re using or should i download a new build?

– Andrea, Justin

The alternative is to use estimated_extent, which is not going to work
unless you did run a vacuum analyze recently.

Seems like a sensible fallback option/alternative to using the more time
consuming - extent() - function. It is recommended PostgreSQL/PostGIS
users re-optimise ( e.g. reindex, vacuum analyse) database tables when
significant changes have been made to the data in them. The ‘Autovacuum
daemon’ provided with PostgreSQL 8.1+, can be set to automatically run
the vacuum tasks as little or as much as the db admin would like, and if
the db admin is working with large databases they should already be
adopting some kind of maintenance/optimisation strategy anyways.

I played around with adding this function to some postgis code but
there was a bit of kickback from others due to the fact that the
resulting answer was not exact.

Mostly agree, i say mostly as we have some project requirements where
knowing the exact extents of a FeatureType is important. In the current
case though it’s not such a high priority and a near enough ( i.e. 95%)
extents would be fine. I probably fall more into the ‘kickback’ camp on
this but it depends on the users requirements and intended use of the
extents.

So one might argue the point “Wouldn’t a 95% accurate answer be
better than null?”

Depends on the users requirements, defaulting to using the -
estimate_extents() - function would, i think, still result in the same
kind of resistance Justin mentioned was encountered when this approach
was adopted in the past. There’s minimally 3 groups of user who might
need to be catered for when it comes deciding how and where FeatureType
extents are obtained, those that - 1) have no preference (exact or 95%,
either will do), 2) require nothing less than exact extents (at a
minimum, as defined at some point in time), and 3) require a mixture of
both 1) and 2) depending on the intended use of each FeatureType
configured within GeoServer.

As Jukka suggested, and as is also the case with the datasets we’re
using which raised the problem and initiated this thread; the extentsof
the datasets very rarely change and when they do we’re happy to manually
update Geoserver to get the exact extents, via FeatureType editor >
Generate. It’s at this point in configuring a FeatureType the user
should/could be notified and or given options about which extents
generation approach GeoServer should use for each FeatureType in future
in order to, for example, generate capabilities docs. Here i see 2
(maybe 3) things could happen,

  1. USE DEFAULT, as Brent suggests, use cached extents generated last
    time FeatureType was configured, with no new call to the extent() or
    estimated_extent() functions being made except for when user update the
    FeatureType configuration and clicks the ‘Generate’ button. - THEN - DO
    NOTHING.
  2. For PostGIS FeatureTypes with low (??insert some threshold here??)
    feature count (derived at same time as extents are first generated or if
    known by the user when they enter extents manually in FeatureType
    editor), and high update frequency (assesment required here) - THEN -
    provide check box to enable use of - extent() - function.
    3?) NOT SURE ABOUT THIS ONE, as it would require more regular running of
    ‘vacuum analyze’ task to make it worthwhile. If user knows in advance a)
    the FeatureType has a medium to large feature count, which would result
    in calls to the - extent() - function taking some time to run, b) the
    FeatureType is likely to see frequent updates to its features geometry
    that would change the extents, and more regularly than the admin user is
    willing to spend time each day updating the FeatureTypes config and c)
    doesn’t mind less accurate, but updated and faster extents creation as
    generated by the - estimated_extent() - function - THEN - provide a
    check box to enable use of - estimate_extent() function

For now 1), the DEFAULT - use the known cached FeatureType extents -
would solve the problem.

Of course these are only suggestions on how i and other users would like
to see this element of GeoServer working.

NOTE: i tested the - estimated_extent() - function directly on our
PostgreSQL/PostGIS instance and it only seems to work for databases
which have lowercase names - so “topographicarea” works but
“TopographicArea” doesn’t! *** Users need to be aware of this *** -
could this be why the GeoServer instance we’re using defaults to using
the - extent() - function instead? DB Table names are deliberately
written in Camel/Pascal case and we’d rather not have to change this.

Cheers,
Simon
!DSPAM:1003,45edb88e37991665516417!



Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net’s Techsay panel and you’ll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

!DSPAM:1003,45edb88e37991665516417!



Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

!DSPAM:1003,45edb88e37991665516417!


Chris Holmes
The Open Planning Project
http://topp.openplans.org