[GRASS-dev] Cloud optimized GeoTIFF support

Hi,

some of you will be familiar with "Cloud optimized GeoTIFF"
(http://www.cogeo.org/) which can be created and read through GDAL,
see

https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF

I was wondering how to register a cloud GeoTIFF source through
r.external. Any ideas (or how much would be needed to be developed?).

thanks,
Markus

On Fri, Oct 13, 2017 at 2:21 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi,

some of you will be familiar with “Cloud optimized GeoTIFF”
(http://www.cogeo.org/) which can be created and read through GDAL,
see

https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF

I was wondering how to register a cloud GeoTIFF source through
r.external. Any ideas (or how much would be needed to be developed?).

Have you tried GDAL’s virtual network based file systems [0]?

[0] http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_network

Markus M

thanks,
Markus


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

Hi,

I assume you have seen this:

https://lists.osgeo.org/pipermail/gdal-dev/2017-October/047349.html

and subsequently this:

https://erouault.blogspot.no/2017/10/gdal-and-cloud-storage.html

Cheers

Stefan

···

On Fri, Oct 13, 2017 at 2:21 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi,

some of you will be familiar with “Cloud optimized GeoTIFF”
(http://www.cogeo.org/) which can be created and read through GDAL,
see

https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF

I was wondering how to register a cloud GeoTIFF source through
r.external. Any ideas (or how much would be needed to be developed?).

Have you tried GDAL’s virtual network based file systems [0]?

[0] http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_network

Markus M

thanks,
Markus


grass-dev mailing list
grass-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/grass-dev

On Fri, Oct 13, 2017 at 2:45 PM, Stefan Blumentrath
<Stefan.Blumentrath@nina.no> wrote:

Hi,

I assume you have seen this:
https://lists.osgeo.org/pipermail/gdal-dev/2017-October/047349.html

and subsequently this:
https://erouault.blogspot.no/2017/10/gdal-and-cloud-storage.html

Sure, that's where I got it from and got interested in a more direct
GRASS GIS integration.

Markus

On Sun, Oct 15, 2017 at 6:25 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Fri, Oct 13, 2017 at 2:45 PM, Stefan Blumentrath
<Stefan.Blumentrath@nina.no> wrote:

Hi,

I assume you have seen this:
https://lists.osgeo.org/pipermail/gdal-dev/2017-October/047349.html

and subsequently this:
https://erouault.blogspot.no/2017/10/gdal-and-cloud-storage.html

Sure, that’s where I got it from and got interested in a more direct
GRASS GIS integration.

What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF?

Markus M

On Sun, Oct 15, 2017 at 7:47 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

What do you mean with "a more direct GRASS GIS integration" regarding cloud
storage and/or Cloud Optimized GeoTIFF?

Well, I tought of r.external and r.external.out.

Markus Metz wrote:

Have you tried GDAL's virtual network based file systems [0]?

... I didn't yet since traveling all the time. Maybe simple and solved
but I have no stable connections at time to try out.

Will study it soon :slight_smile:

markusN

On dimanche 15 octobre 2017 21:28:24 CEST Markus Neteler wrote:

On Sun, Oct 15, 2017 at 7:47 PM, Markus Metz

<markus.metz.giswork@gmail.com> wrote:
> What do you mean with "a more direct GRASS GIS integration" regarding
> cloud
> storage and/or Cloud Optimized GeoTIFF?

Well, I tought of r.external and r.external.out.

Markus Metz wrote:
> Have you tried GDAL's virtual network based file systems [0]?

... I didn't yet since traveling all the time. Maybe simple and solved
but I have no stable connections at time to try out.

Also note that the performance of network accesses depends a lot of how close
the machine is from the server. For example if you use the VMs of a cloud
provider in the same region as the bucket you access, the timing measuremetns
will be 10 times or more better than if you try from a consumer ADSL
connection.

For Linux & Mac, there are also various projects that use FUSE to mount
buckets in the file system
https://github.com/kahing/goofys
https://github.com/googlecloudplatform/gcsfuse

I tried quickly goofys and it worked pretty well. It tends to have a more
aggressive read-ahead strategy than GDAL /vsis3/, which makes reading a lot of
sequential data faster than with /vsis3/. In latest GDAL trunk, the GeoTIFF
driver is aware of the network nature of the files and thus can better
optimize windowed access, so this makes it a bit faster in cases where you
only access part of the imagery (for example with MapServer that must satisfy
a WMS request on a BBOX, which was the use case for this recent round of
optimization, or if you have a processing spread over several VMs with a
spatial subdivision)

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com

On Sun, Oct 15, 2017 at 9:28 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Sun, Oct 15, 2017 at 7:47 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

What do you mean with “a more direct GRASS GIS integration” regarding cloud
storage and/or Cloud Optimized GeoTIFF?

Well, I tought of r.external and r.external.out.

OK, rephrasing my question:

What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF, and regarding reading and writing such data?

Markus M

Markus Metz wrote:

Have you tried GDAL’s virtual network based file systems [0]?

… I didn’t yet since traveling all the time. Maybe simple and solved
but I have no stable connections at time to try out.

Will study it soon :slight_smile:

markusN

On Oct 15, 2017 9:50 PM, “Markus Metz” <markus.metz.giswork@gmail.com> wrote:

On Sun, Oct 15, 2017 at 9:28 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Sun, Oct 15, 2017 at 7:47 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

What do you mean with “a more direct GRASS GIS integration” regarding cloud
storage and/or Cloud Optimized GeoTIFF?

Well, I tought of r.external and r.external.out.

OK, rephrasing my question:

What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF, and regarding reading and writing such data?

So, my need is to process EO data closest possible where they are stored.
Imagine a huge object storage or bucket or the like which is surrounded by cloud compute instances. Given the data size between some Terabytes up to Petabytes (e.g. Sentinel) I need to minimize whatever data transfer or extra (format) conversion.

Even’s suggestion to use FUSE to mount
buckets in the file system will be something to try.

However, the location concept is also causing some friction here: Sentinel data are stored in many different zones i.e. would result in many different locations. I have no solution to that yet but this issue plus the need to minimize bandwidth usage is the key for a successful cloud storage based processing.

So I started with thinking about how to connect to current EO storages.

Best.
markusN

On Mon, Oct 16, 2017 at 9:10 AM, Markus Neteler <neteler@osgeo.org> wrote:

On Oct 15, 2017 9:50 PM, “Markus Metz” <markus.metz.giswork@gmail.com> wrote:

On Sun, Oct 15, 2017 at 9:28 PM, Markus Neteler <neteler@osgeo.org> wrote:

On Sun, Oct 15, 2017 at 7:47 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:

What do you mean with “a more direct GRASS GIS integration” regarding cloud
storage and/or Cloud Optimized GeoTIFF?

Well, I tought of r.external and r.external.out.

OK, rephrasing my question:

What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF, and regarding reading and writing such data?

So, my need is to process EO data closest possible where they are stored.

That sounds like you don’t need a more direct GRASS GIS integration of cloud storage, but instead are more direct integration of a GRASS installation into cloud storage.

Imagine a huge object storage or bucket or the like which is surrounded by cloud compute instances. Given the data size between some Terabytes up to Petabytes (e.g. Sentinel) I need to minimize whatever data transfer or extra (format) conversion.

Assuming a local GRASS db, data transfer increases with remote input to r.external or a remote directory as output for r.external.out.
If you want to reduce data transfer, use r.in.gdal/r.out.gdal.
If you want to reduce format conversions, use r.external/r.external.out.

Even’s suggestion to use FUSE to mount
buckets in the file system will be something to try.

However, the location concept is also causing some friction here: Sentinel data are stored in many different zones i.e. would result in many different locations.

Would gdalwarp help?

Markus M

I have no solution to that yet but this issue plus the need to minimize bandwidth usage is the key for a successful cloud storage based processing.

So I started with thinking about how to connect to current EO storages.

Best.
markusN