[Geoserver-devel] NetCDF output memory footprint

I’ve run into a problem with the memory footprint using the netcdf-output plugin with n-dimensional datasets. Consider a WCS 2.0.1 request that wants multiple times and elevations in netcdf format. The WCS GetCoverage operation slices this request into 2D slices, and loads these slices into a GranuleStack, which allows the netcdf output module (and presumably other n-dimensional output formats in the future) to stitch them back together in the desired format.

The problem arises when the number of 2D slices are large, as they are completely held in memory. I’m trying to find a solution to this, to substantially increase the size of datasets that can be requested in netcdf format.

My initial stab was setting up a deferred read by implementing a quick GridCoverage2D wrapper, that basically encapsulated all of the GetCoverage work and only performs the read when the image data is accessed, while modifying netcdf-out to work with only one slice at a time. This isn’t working so well, because it is hard to separate the image from the rest of the non-pixel data related calls in the GridCoverage2D.

I was curious before I went further down this route if there is a more obvious solution then basically extending and heavily modifying GridCoverage2D in addition to changing the netcdf output encoder itself, or utilizing some type of file buffered Image implementation.

Thanks.

···

Clifford M. Harms

Hi Clifford,
I have a couple of questions on your use case for some feedbacks:

  1. which GeoServer version are you using? (did you already tried with the latest RC?)

  2. which store are you using to configure your input data? Is it a NetCDF store OR an ImageMosaic one?

Please,
let us know.

On Tue, Jan 29, 2019 at 7:22 AM Clifford Harms <clifford.harms@anonymised.com> wrote:

I’ve run into a problem with the memory footprint using the netcdf-output plugin with n-dimensional datasets. Consider a WCS 2.0.1 request that wants multiple times and elevations in netcdf format. The WCS GetCoverage operation slices this request into 2D slices, and loads these slices into a GranuleStack, which allows the netcdf output module (and presumably other n-dimensional output formats in the future) to stitch them back together in the desired format.

The problem arises when the number of 2D slices are large, as they are completely held in memory. I’m trying to find a solution to this, to substantially increase the size of datasets that can be requested in netcdf format.

My initial stab was setting up a deferred read by implementing a quick GridCoverage2D wrapper, that basically encapsulated all of the GetCoverage work and only performs the read when the image data is accessed, while modifying netcdf-out to work with only one slice at a time. This isn’t working so well, because it is hard to separate the image from the rest of the non-pixel data related calls in the GridCoverage2D.

I was curious before I went further down this route if there is a more obvious solution then basically extending and heavily modifying GridCoverage2D in addition to changing the netcdf output encoder itself, or utilizing some type of file buffered Image implementation.

Thanks.

Clifford M. Harms


Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Regards,
Daniele Romagnoli

GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.

Ing. Daniele Romagnoli
Senior Software Engineer

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.

I haven’t tried the latest RC, I believe we are using 2.14 at the moment, so I will check it out (if memory usage is improved I will do backflips).

The store is NetCDF, and in particular NetCDF aggregations using the NCML/FMRC XML formats (this prevents making use of the Direct Download feature in CSW, unfortunately).

···

Clifford M. Harms

Hi Clifford,
ImageMosaic allows to select between immediate read (in memory) and deferred read whilst as far as I remember the pure NetCDF store internally uses immediate mode.
However, I think that ImageMosaic has never been tested before against NetCDF aggregations (whilst it has been used with ncml) so you may need to doublecheck.
You may also want to take a look at this section of the GeoServer doc, which provides some instructions on how to create ImageMosaic ancillary files, through the CreateIndexer.jar tool:
https://docs.geoserver.org/stable/en/user/extensions/netcdf/netcdf.html#migrating-mosaics-with-h2-netcdf-index-files-to-a-centralized-index

Note that:

  • That page refers to a specific topic so you may want to be only interested on the CreateIndexer usage, in order to prepare needed files to serve NetCDFs through ImageMosaic.
  • Not sure that tool has ever been used/tested before against aggregations so you may have to check.

We can also think about supporting deferred execution on NetCDF reader too.
(It has been developed 10 years ago so there might be the case that at time the immediate read was solving some issues occurred with deferred loading).
Since you already started playing with code, you may want to do a quick test by changing the ReadType object to JAI_IMAGEREAD on NetCDFRequest class.

Please, let us know.
Regards,
Daniele

···

Regards,
Daniele Romagnoli

GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.

Ing. Daniele Romagnoli
Senior Software Engineer

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.

Just a small note to show why this might be better than the deferred loading approach you already tried.
If I understand correctly, what you did is normal lazy loading, once a coverage is loaded, it stays in memory, you’re just avoiding up front loading…
meaning that during encoding all coverages end up being loaded in memory.

ImageIO deferred loading is instead a JAI operation that uses the JAI tile cache, it loads tiles there on a as needed basis,
and if there are too many, the cache throws them away and they will be reloaded if needed. So you get good memory control,
at the expense of potential more I/O

Cheers
Andrea

···

GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.

On Thu, Jan 31, 2019 at 8:32 AM Andrea Aime <andrea.aime@anonymised.com268…> wrote:

If I understand correctly, what you did is normal lazy loading, once a coverage is loaded, it stays in memory, you’re just avoiding up front loading…
meaning that during encoding all coverages end up being loaded in memory.

This is exactly the problem I’m running into with the quick and dirty lazy loading, and it appears this is the wrong approach entirely is it seems difficult, if not impossible, to maintain a hard reference to only one 2D slice at a time. I was unaware of the JAI deferred reading, I will perform the JAI_IMAGEREAD test recommended by Daniele with the existing netcdf plugin and see what happens. I’m also probably going to accelerate my plans to evaluate image mosaic as an alternative to aggregation by ncml.

Clifford M. Harms