On dimanche 15 octobre 2017 21:28:24 CEST Markus Neteler wrote:
On Sun, Oct 15, 2017 at 7:47 PM, Markus Metz
<markus.metz.giswork@gmail.com> wrote:
> What do you mean with "a more direct GRASS GIS integration" regarding
> cloud
> storage and/or Cloud Optimized GeoTIFF?
Well, I tought of r.external and r.external.out.
Markus Metz wrote:
> Have you tried GDAL's virtual network based file systems [0]?
... I didn't yet since traveling all the time. Maybe simple and solved
but I have no stable connections at time to try out.
Also note that the performance of network accesses depends a lot of how close
the machine is from the server. For example if you use the VMs of a cloud
provider in the same region as the bucket you access, the timing measuremetns
will be 10 times or more better than if you try from a consumer ADSL
connection.
I tried quickly goofys and it worked pretty well. It tends to have a more
aggressive read-ahead strategy than GDAL /vsis3/, which makes reading a lot of
sequential data faster than with /vsis3/. In latest GDAL trunk, the GeoTIFF
driver is aware of the network nature of the files and thus can better
optimize windowed access, so this makes it a bit faster in cases where you
only access part of the imagery (for example with MapServer that must satisfy
a WMS request on a BBOX, which was the use case for this recent round of
optimization, or if you have a processing spread over several VMs with a
spatial subdivision)
What do you mean with “a more direct GRASS GIS integration” regarding cloud
storage and/or Cloud Optimized GeoTIFF?
Well, I tought of r.external and r.external.out.
OK, rephrasing my question:
What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF, and regarding reading and writing such data?
Markus M
Markus Metz wrote:
Have you tried GDAL’s virtual network based file systems [0]?
… I didn’t yet since traveling all the time. Maybe simple and solved
but I have no stable connections at time to try out.
What do you mean with “a more direct GRASS GIS integration” regarding cloud
storage and/or Cloud Optimized GeoTIFF?
Well, I tought of r.external and r.external.out.
OK, rephrasing my question:
What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF, and regarding reading and writing such data?
So, my need is to process EO data closest possible where they are stored.
Imagine a huge object storage or bucket or the like which is surrounded by cloud compute instances. Given the data size between some Terabytes up to Petabytes (e.g. Sentinel) I need to minimize whatever data transfer or extra (format) conversion.
Even’s suggestion to use FUSE to mount
buckets in the file system will be something to try.
However, the location concept is also causing some friction here: Sentinel data are stored in many different zones i.e. would result in many different locations. I have no solution to that yet but this issue plus the need to minimize bandwidth usage is the key for a successful cloud storage based processing.
So I started with thinking about how to connect to current EO storages.
What do you mean with “a more direct GRASS GIS integration” regarding cloud
storage and/or Cloud Optimized GeoTIFF?
Well, I tought of r.external and r.external.out.
OK, rephrasing my question:
What do you mean with “a more direct GRASS GIS integration” regarding cloud storage and/or Cloud Optimized GeoTIFF, and regarding reading and writing such data?
So, my need is to process EO data closest possible where they are stored.
That sounds like you don’t need a more direct GRASS GIS integration of cloud storage, but instead are more direct integration of a GRASS installation into cloud storage.
Imagine a huge object storage or bucket or the like which is surrounded by cloud compute instances. Given the data size between some Terabytes up to Petabytes (e.g. Sentinel) I need to minimize whatever data transfer or extra (format) conversion.
Assuming a local GRASS db, data transfer increases with remote input to r.external or a remote directory as output for r.external.out.
If you want to reduce data transfer, use r.in.gdal/r.out.gdal.
If you want to reduce format conversions, use r.external/r.external.out.
Even’s suggestion to use FUSE to mount
buckets in the file system will be something to try.
However, the location concept is also causing some friction here: Sentinel data are stored in many different zones i.e. would result in many different locations.
Would gdalwarp help?
Markus M
I have no solution to that yet but this issue plus the need to minimize bandwidth usage is the key for a successful cloud storage based processing.
So I started with thinking about how to connect to current EO storages.