[Geoserver-users] Multidimensional NetCDF coverage optimization

Hello all,

I noticed an inefficiency in the way GeoServer serves multi-dimensional coverage data and was curious if there was a way to optimize it without completely ripping out all the “plumbing” per se. The inefficiency occurs when requesting multiple coverage dimensions from an ImageMosaic data store where two or more of the requested dimensions exist within the same source file. For example, requesting all elevations of a Temperature data coverage that are all housed in the same NetCDF file on disk.

GeoServer fields the request by querying the DB for granules that meet the criteria. Each granule contains a location of the source file containing that granule’s data. In the above case, the result is a list of granules that all point to the same file on disk. GeoServer then iterates over each granule, opening the file, reading the contents for one specific dimension slice, and then closing the file. Herein lies the problem. The IO required to repeatedly open & close the source file can dramatically increase the system’s response time.

As an example, a request for a single elevation in my test system takes 0.1 seconds to complete. A request for all 25 elevations from a single file balloons the response time to 3.3 seconds. All times were calculated via CLI client (time curl …) so they would include network latency of transferring a larger file in the latter test, however the metrics were gathered directly on the host system so network IO should have been minimal.

Has anyone else seen this? Is it possible to optimize without completely reworking the GetCoverage & geotools implementations?

Any advice would be helpful.

Thanks,

Kevin M. Weiss

Software Engineer

Critical Networks / HARRIS CORPORATION

Harris_wR_2color_72dpi.jpg