Hi,
this week I'm going to try and impleemnt some WCS limits
for people that want to serve big amounts of raster data
without the headache of a user hitting the server hard with
a very large request.
Generally speaking we want to avoid that:
- the request ends up reading too much data (e.g., we don't
want a user to make a request that will make the server
read 10TB of data)
- the request ends up using too much memory
- the request ends up generating a too large response (e.g.,
again, we don't want the server to generate a 10GB response)
Providing a general solution to the problem can be very complex:
* a tiled data source may allow to streaming read by tiles,
allowing for small memory usage while still reading a
truckload of data
* a tiled output format can ensure nowhere in the chain
the whole image is composed in memory
* however, a non tiled input or a non tiled output will at
some point make the image be built fully in memory
* where and when is difficult to say, e.g., we might read
a small amount of data from the input and then build a
huge raster in memory because the user is supersampling
(asking a higher than native resolution) and the output
format is not tile enabled
* the MB read during input and output are again difficult
to control because of format and compression differences
* the WCS right now does not use overviews, but it might in
the future
Long story short, trying to control the actual amount of data
read, kept in memory and generated in output is beyond our
reach.
What I'm going to propose is a simplified compromise based
on a worst case scenario.
We allow the administrator to setup a maximum of MB to be
read, and a maximum MB to be generated in output.
The measure in MB is computed as an equivalent single tile,
uncompressed situation (assuming everything has to be
read or generated in one shot):
width * height * bands * band_size
This simplifying assumption ensures we are not going to
ever have more than the limits be read, kept in memory
or generated: normally (hopefully) we'll actually have
less.
The distinction between input and output is there because
normally WCS request perform some resampling and generate
inputs at a resolution different than the outputs, and
in some setups the admin can play on the difference to
relax at least the input limits.
In particular, if the admin can ensure all rasterv sources
are tiled, it's possible to relax the input limits as
the tiled sources will never load the full data in memory,
and the WCS processing chain ensures that at worst the
data is recomposed in a single tile if the output format
cannot deal with inner tiling.
Even in a setup where all the sources are tiled and the output
format list has been somehow modified to allow only tiling formats
(atm, only geotiff) it would still be good to setup some
(large) limits to avoid disk or network flooding for long amounts
of time.
Let's make an example. If we set a 200MB of limit as input,
and 20MB of limit as output, then following will be considered
valid:
* a request that makes GS read a 14481x14481 portion of a
8bit single band raster data
* a request that makes GS read a 7240x7240 portion of a RGBA
(or other 4 band) image
* a request that makes GS generate a 4579*4579 8bit raster in
output
* a request that makes GS generate a 2290*2290 4byte raster in
output (RGBA, or a single band, floating point, double precision
one).
This should give the administrator some control and safety without
forcing us to consider all the possibilities of formats, compressions,
and tiling arrangements. Using equivalent MB also summarizes well the
many possible setups of width, height, bands and band sizes in a single
number that can be setup WCS wide and provide peace of mind to the
administrator (and stability to the server)
Opinions?
Cheers
Andrea
--
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.