[Geoserver-devel] Reading GeoTiffs from HDFS

Hi all,

I want to report on my success with registering and displaying GeoTiffs stored on HDFS. There are some limitations with this approach; particularly, I am unsure if there's anyway to cache / memory-map the data. As such, I believe each request is re-downloading the entire file.

Generally, I hope to document my approach well enough so that others could follow it (if needed) and to solicit feedback. In terms of feedback, I'd love to hear 1) if there are improvements, and 2) if the changes are reasonable enough to be considered for a proposal/merge request.

That out of the way, here's the rough outline:

1. Register additional URL handlers.
2. Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
3. Get bytes out of the HDFS file.

For step 1, note that Java's URL scheme is pluggable via java.net.URLStreamHandler. The docs(1) point out that one can call URL.setURLStreamHandlerFactory to setup a Factory to provide such a handler. This method can only be called once, and folks from the internet (2) do yoga since Tomcat already registers a factory. They seem to have missed the fact that the Tomcat factory actually lets you add your own. I provide a gist (3) to show a little bean which will instantiate a Hadoop URL handler and try to install it using both of those methods.

There are two places I found in GeoServer which validate the URL given in the page for adding a GeoTiff. The first is the GeoServer FileExistValidator which calls out to a Wicket UrlValidator. Telling the Wicket class to allow_all_schemes knocks out that issue. For the second, in the FileModel, one needs to provide a happy path for URLs which are not local to the filesystem. Those two small changes are here (4).

Once GeoServer will register a GeoTiff coverage with a non-'file://' URL, we need to read the bytes. Javax has an interface javax.imageio.spi.ImageInputStreamSpi which adapts between instances of a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a string, checks if it starts with "hdfs", creates a URL, and returns new MemoryCacheImageInputStream(url.openStream()). The only problem with this approach is that there is already an implementation which handles Strings, and GeoTools's ImageIOExt tries the first one and skips any others. One can update that handling (5) slightly to try all the handlers. It'd probably be better to update (6) to try url.openStream as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer. The hdfs layer was a little slower than the local filesystem layer, but it wasn't unusable. To crank things up, I tried out a 600+ megabyte GeoTiff from Natural Earth, and it was downright slow. Using a network monitor, I was able to observe network traffic consistent with the entire file being re-read for most requests. I think this approach may be slightly useful for layers which are infrequently accessed and then only be a few users.

Thanks to everyone who had suggestions and encouragement for the original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

1. http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL(java.lang.String,%20java.lang.String,%20int,%20java.lang.String)

2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

3. Gist for a bean to register the Hadoop URL handlers:
https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4. https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5. https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

Or one could modify the URL handling here:
6. https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97

Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you'll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

- has an internal cache (file and or memory) that does not get thrown
away upon each request but tends to live longer for each single file
in HDF
- we would have different streams reuse the same cache. Multiple
requests might read data from the cache concurrently but when data is
not there, we would block the thread for the request, go back to HFDS,
pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@anonymised.com> wrote:

Hi all,

I want to report on my success with registering and displaying GeoTiffs
stored on HDFS. There are some limitations with this approach;
particularly, I am unsure if there's anyway to cache / memory-map the
data. As such, I believe each request is re-downloading the entire file.

Generally, I hope to document my approach well enough so that others
could follow it (if needed) and to solicit feedback. In terms of
feedback, I'd love to hear 1) if there are improvements, and 2) if the
changes are reasonable enough to be considered for a proposal/merge request.

That out of the way, here's the rough outline:

1. Register additional URL handlers.
2. Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
3. Get bytes out of the HDFS file.

For step 1, note that Java's URL scheme is pluggable via
java.net.URLStreamHandler. The docs(1) point out that one can call
URL.setURLStreamHandlerFactory to setup a Factory to provide such a
handler. This method can only be called once, and folks from the
internet (2) do yoga since Tomcat already registers a factory. They
seem to have missed the fact that the Tomcat factory actually lets you
add your own. I provide a gist (3) to show a little bean which will
instantiate a Hadoop URL handler and try to install it using both of
those methods.

There are two places I found in GeoServer which validate the URL given
in the page for adding a GeoTiff. The first is the GeoServer
FileExistValidator which calls out to a Wicket UrlValidator. Telling the
Wicket class to allow_all_schemes knocks out that issue. For the
second, in the FileModel, one needs to provide a happy path for URLs
which are not local to the filesystem. Those two small changes are here
(4).

Once GeoServer will register a GeoTiff coverage with a non-'file://'
URL, we need to read the bytes. Javax has an interface
javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a
string, checks if it starts with "hdfs", creates a URL, and returns new
MemoryCacheImageInputStream(url.openStream()). The only problem with
this approach is that there is already an implementation which handles
Strings, and GeoTools's ImageIOExt tries the first one and skips any
others. One can update that handling (5) slightly to try all the
handlers. It'd probably be better to update (6) to try url.openStream
as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer.
The hdfs layer was a little slower than the local filesystem layer, but
it wasn't unusable. To crank things up, I tried out a 600+ megabyte
GeoTiff from Natural Earth, and it was downright slow. Using a network
monitor, I was able to observe network traffic consistent with the
entire file being re-read for most requests. I think this approach may
be slightly useful for layers which are infrequently accessed and then
only be a few users.

Thanks to everyone who had suggestions and encouragement for the
original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

1.
http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL(java.lang.String,%20java.lang.String,%20int,%20java.lang.String)

2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

3. Gist for a bean to register the Hadoop URL handlers:
https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4.
https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5.
https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

Or one could modify the URL handling here:
6.
https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Hi Simone,

Thanks for the feedback!

As quick response, for #1, I agree that using mosaicing / an image pyramid would be a great option. I was mainly working at the prototype phase, and I wanted to have a discussion on the mailing lists (especially since changes are required in ImageIO-Ext or GeoTools and GeoServer.)

For #2, I do like the idea of having a cahce in the ImageInputStream. From that suggestion, I take it that you'd be willing to entertain changes to the current ImageInputStreams and the additional of some way to cache data.

In terms of caching, do you have any suggestions? Also, I'd be interested in any advice for how we can configure that cache and make those options available to a GeoServer admin appropriately.

Further, at a high-level, should the goal for this work be a community module?

Cheers,

Jim

On 04/22/2016 01:49 PM, Simone Giannecchini wrote:

Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you'll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

- has an internal cache (file and or memory) that does not get thrown
away upon each request but tends to live longer for each single file
in HDF
- we would have different streams reuse the same cache. Multiple
requests might read data from the cache concurrently but when data is
not there, we would block the thread for the request, go back to HFDS,
pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@anonymised.com> wrote:

Hi all,

I want to report on my success with registering and displaying GeoTiffs
stored on HDFS. There are some limitations with this approach;
particularly, I am unsure if there's anyway to cache / memory-map the
data. As such, I believe each request is re-downloading the entire file.

Generally, I hope to document my approach well enough so that others
could follow it (if needed) and to solicit feedback. In terms of
feedback, I'd love to hear 1) if there are improvements, and 2) if the
changes are reasonable enough to be considered for a proposal/merge request.

That out of the way, here's the rough outline:

1. Register additional URL handlers.
2. Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
3. Get bytes out of the HDFS file.

For step 1, note that Java's URL scheme is pluggable via
java.net.URLStreamHandler. The docs(1) point out that one can call
URL.setURLStreamHandlerFactory to setup a Factory to provide such a
handler. This method can only be called once, and folks from the
internet (2) do yoga since Tomcat already registers a factory. They
seem to have missed the fact that the Tomcat factory actually lets you
add your own. I provide a gist (3) to show a little bean which will
instantiate a Hadoop URL handler and try to install it using both of
those methods.

There are two places I found in GeoServer which validate the URL given
in the page for adding a GeoTiff. The first is the GeoServer
FileExistValidator which calls out to a Wicket UrlValidator. Telling the
Wicket class to allow_all_schemes knocks out that issue. For the
second, in the FileModel, one needs to provide a happy path for URLs
which are not local to the filesystem. Those two small changes are here
(4).

Once GeoServer will register a GeoTiff coverage with a non-'file://'
URL, we need to read the bytes. Javax has an interface
javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a
string, checks if it starts with "hdfs", creates a URL, and returns new
MemoryCacheImageInputStream(url.openStream()). The only problem with
this approach is that there is already an implementation which handles
Strings, and GeoTools's ImageIOExt tries the first one and skips any
others. One can update that handling (5) slightly to try all the
handlers. It'd probably be better to update (6) to try url.openStream
as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer.
The hdfs layer was a little slower than the local filesystem layer, but
it wasn't unusable. To crank things up, I tried out a 600+ megabyte
GeoTiff from Natural Earth, and it was downright slow. Using a network
monitor, I was able to observe network traffic consistent with the
entire file being re-read for most requests. I think this approach may
be slightly useful for layers which are infrequently accessed and then
only be a few users.

Thanks to everyone who had suggestions and encouragement for the
original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

1.
http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL(java.lang.String,%20java.lang.String,%20int,%20java.lang.String)

2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

3. Gist for a bean to register the Hadoop URL handlers:
https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4.
https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5.
https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

Or one could modify the URL handling here:
6.
https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Hi,

I don't know that much about HDFS, but is there something that can be setup like a map/reduce function directly in the HDFS servers that can do some of the restriction of byte level data returned? Yarn/Sparql, some other acronym? I assume it would have to be administrator responsibility to add said process to the server stack if it is even possible.

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Jim Hughes [mailto:jnh5y@…1612…]
Sent: Friday, April 22, 2016 12:05 PM
To: Simone Giannecchini <simone.giannecchini@...1268...>
Cc: geoserver-devel@lists.sourceforge.net; GeoTools Developers list <geotools-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS

Hi Simone,

Thanks for the feedback!

As quick response, for #1, I agree that using mosaicing / an image
pyramid would be a great option. I was mainly working at the prototype
phase, and I wanted to have a discussion on the mailing lists
(especially since changes are required in ImageIO-Ext or GeoTools and
GeoServer.)

For #2, I do like the idea of having a cahce in the ImageInputStream.
From that suggestion, I take it that you'd be willing to entertain
changes to the current ImageInputStreams and the additional of some way
to cache data.

In terms of caching, do you have any suggestions? Also, I'd be
interested in any advice for how we can configure that cache and make
those options available to a GeoServer admin appropriately.

Further, at a high-level, should the goal for this work be a community
module?

Cheers,

Jim

On 04/22/2016 01:49 PM, Simone Giannecchini wrote:

Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you'll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

- has an internal cache (file and or memory) that does not get thrown
away upon each request but tends to live longer for each single file
in HDF
- we would have different streams reuse the same cache. Multiple
requests might read data from the cache concurrently but when data is
not there, we would block the thread for the request, go back to HFDS,
pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@...1612...> wrote:

Hi all,

I want to report on my success with registering and displaying GeoTiffs
stored on HDFS. There are some limitations with this approach;
particularly, I am unsure if there's anyway to cache / memory-map the
data. As such, I believe each request is re-downloading the entire file.

Generally, I hope to document my approach well enough so that others
could follow it (if needed) and to solicit feedback. In terms of
feedback, I'd love to hear 1) if there are improvements, and 2) if the
changes are reasonable enough to be considered for a proposal/merge request.

That out of the way, here's the rough outline:

1. Register additional URL handlers.
2. Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
3. Get bytes out of the HDFS file.

For step 1, note that Java's URL scheme is pluggable via
java.net.URLStreamHandler. The docs(1) point out that one can call
URL.setURLStreamHandlerFactory to setup a Factory to provide such a
handler. This method can only be called once, and folks from the
internet (2) do yoga since Tomcat already registers a factory. They
seem to have missed the fact that the Tomcat factory actually lets you
add your own. I provide a gist (3) to show a little bean which will
instantiate a Hadoop URL handler and try to install it using both of
those methods.

There are two places I found in GeoServer which validate the URL given
in the page for adding a GeoTiff. The first is the GeoServer
FileExistValidator which calls out to a Wicket UrlValidator. Telling the
Wicket class to allow_all_schemes knocks out that issue. For the
second, in the FileModel, one needs to provide a happy path for URLs
which are not local to the filesystem. Those two small changes are here
(4).

Once GeoServer will register a GeoTiff coverage with a non-'file://'
URL, we need to read the bytes. Javax has an interface
javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a
string, checks if it starts with "hdfs", creates a URL, and returns new
MemoryCacheImageInputStream(url.openStream()). The only problem with
this approach is that there is already an implementation which handles
Strings, and GeoTools's ImageIOExt tries the first one and skips any
others. One can update that handling (5) slightly to try all the
handlers. It'd probably be better to update (6) to try url.openStream
as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer.
The hdfs layer was a little slower than the local filesystem layer, but
it wasn't unusable. To crank things up, I tried out a 600+ megabyte
GeoTiff from Natural Earth, and it was downright slow. Using a network
monitor, I was able to observe network traffic consistent with the
entire file being re-read for most requests. I think this approach may
be slightly useful for layers which are infrequently accessed and then
only be a few users.

Thanks to everyone who had suggestions and encouragement for the
original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

1.
http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL(java.lang.String,%20java.lang.String,%20int,%20java.lang.String)

2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

3. Gist for a bean to register the Hadoop URL handlers:
https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4.
https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5.
https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

Or one could modify the URL handling here:
6.
https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

I did find this reference (helpful ?):
https://github.com/openreserach/bin2seq/blob/master/src/main/java/com/openresearchinc/hadoop/sequencefile/GeoTiff.java

" /@formatter:off
/**
*
* A program to demo retrive attributes from Geotiff images as Hadoop SequenceFile stored on hdfs:// or s3://
*
*
* @author heq
*/
// @formatter:on"

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Chris Snider [mailto:chris.snider@…2565…]
Sent: Friday, April 22, 2016 12:11 PM
To: Jim Hughes <jnh5y@...1612...>; Simone Giannecchini <simone.giannecchini@...1268...>
Cc: geoserver-devel@lists.sourceforge.net; GeoTools Developers list <geotools-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS

Hi,

I don't know that much about HDFS, but is there something that can be setup like a map/reduce function directly in the HDFS servers that can do some of the restriction of byte level data returned? Yarn/Sparql, some other acronym? I assume it would have to be administrator responsibility to add said process to the server stack if it is even possible.

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Jim Hughes [mailto:jnh5y@…1612…]
Sent: Friday, April 22, 2016 12:05 PM
To: Simone Giannecchini <simone.giannecchini@...1268...>
Cc: geoserver-devel@lists.sourceforge.net; GeoTools Developers list <geotools-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS

Hi Simone,

Thanks for the feedback!

As quick response, for #1, I agree that using mosaicing / an image
pyramid would be a great option. I was mainly working at the prototype
phase, and I wanted to have a discussion on the mailing lists
(especially since changes are required in ImageIO-Ext or GeoTools and
GeoServer.)

For #2, I do like the idea of having a cahce in the ImageInputStream.
From that suggestion, I take it that you'd be willing to entertain
changes to the current ImageInputStreams and the additional of some way
to cache data.

In terms of caching, do you have any suggestions? Also, I'd be
interested in any advice for how we can configure that cache and make
those options available to a GeoServer admin appropriately.

Further, at a high-level, should the goal for this work be a community
module?

Cheers,

Jim

On 04/22/2016 01:49 PM, Simone Giannecchini wrote:

Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you'll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

- has an internal cache (file and or memory) that does not get thrown
away upon each request but tends to live longer for each single file
in HDF
- we would have different streams reuse the same cache. Multiple
requests might read data from the cache concurrently but when data is
not there, we would block the thread for the request, go back to HFDS,
pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@...1612...> wrote:

Hi all,

I want to report on my success with registering and displaying GeoTiffs
stored on HDFS. There are some limitations with this approach;
particularly, I am unsure if there's anyway to cache / memory-map the
data. As such, I believe each request is re-downloading the entire file.

Generally, I hope to document my approach well enough so that others
could follow it (if needed) and to solicit feedback. In terms of
feedback, I'd love to hear 1) if there are improvements, and 2) if the
changes are reasonable enough to be considered for a proposal/merge request.

That out of the way, here's the rough outline:

1. Register additional URL handlers.
2. Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
3. Get bytes out of the HDFS file.

For step 1, note that Java's URL scheme is pluggable via
java.net.URLStreamHandler. The docs(1) point out that one can call
URL.setURLStreamHandlerFactory to setup a Factory to provide such a
handler. This method can only be called once, and folks from the
internet (2) do yoga since Tomcat already registers a factory. They
seem to have missed the fact that the Tomcat factory actually lets you
add your own. I provide a gist (3) to show a little bean which will
instantiate a Hadoop URL handler and try to install it using both of
those methods.

There are two places I found in GeoServer which validate the URL given
in the page for adding a GeoTiff. The first is the GeoServer
FileExistValidator which calls out to a Wicket UrlValidator. Telling the
Wicket class to allow_all_schemes knocks out that issue. For the
second, in the FileModel, one needs to provide a happy path for URLs
which are not local to the filesystem. Those two small changes are here
(4).

Once GeoServer will register a GeoTiff coverage with a non-'file://'
URL, we need to read the bytes. Javax has an interface
javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a
string, checks if it starts with "hdfs", creates a URL, and returns new
MemoryCacheImageInputStream(url.openStream()). The only problem with
this approach is that there is already an implementation which handles
Strings, and GeoTools's ImageIOExt tries the first one and skips any
others. One can update that handling (5) slightly to try all the
handlers. It'd probably be better to update (6) to try url.openStream
as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer.
The hdfs layer was a little slower than the local filesystem layer, but
it wasn't unusable. To crank things up, I tried out a 600+ megabyte
GeoTiff from Natural Earth, and it was downright slow. Using a network
monitor, I was able to observe network traffic consistent with the
entire file being re-read for most requests. I think this approach may
be slightly useful for layers which are infrequently accessed and then
only be a few users.

Thanks to everyone who had suggestions and encouragement for the
original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

1.
http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL(java.lang.String,%20java.lang.String,%20int,%20java.lang.String)

2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

3. Gist for a bean to register the Hadoop URL handlers:
https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4.
https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5.
https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

Or one could modify the URL handling here:
6.
https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Hi Chris,

Nice! That's a fun find.

Generally, I do like the idea of using Map/Reduce or Spark to pre-generate tiles or an image pyramid. We've kicked around the idea of GWC + M/R a few times in passing. If one has Hadoop infrastructure hanging around, it might make sense to use GeoTrellis, SpatialHadoop (GeoJini), etc. for some of that processing.

Either way, being able to read the odd raster file straight from hdfs:// or s3:// and have it cached in memory seems like an amusing/useful project. I'm hopeful we can nail down the details.

Cheers,

Jim

On 04/22/2016 02:27 PM, Chris Snider wrote:

I did find this reference (helpful ?):
https://github.com/openreserach/bin2seq/blob/master/src/main/java/com/openresearchinc/hadoop/sequencefile/GeoTiff.java

" /@formatter:off
/**
  *
  * A program to demo retrive attributes from Geotiff images as Hadoop SequenceFile stored on hdfs:// or s3://
  *
  * @author heq
  */
// @formatter:on"

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Chris Snider [mailto:chris.snider@anonymised.com]
Sent: Friday, April 22, 2016 12:11 PM
To: Jim Hughes <jnh5y@anonymised.com>; Simone Giannecchini <simone.giannecchini@anonymised.com>
Cc: geoserver-devel@lists.sourceforge.net; GeoTools Developers list <geotools-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS

Hi,

I don't know that much about HDFS, but is there something that can be setup like a map/reduce function directly in the HDFS servers that can do some of the restriction of byte level data returned? Yarn/Sparql, some other acronym? I assume it would have to be administrator responsibility to add said process to the server stack if it is even possible.

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Jim Hughes [mailto:jnh5y@anonymised.com]
Sent: Friday, April 22, 2016 12:05 PM
To: Simone Giannecchini <simone.giannecchini@anonymised.com>
Cc: geoserver-devel@lists.sourceforge.net; GeoTools Developers list <geotools-devel@lists.sourceforge.net>
Subject: Re: [Geoserver-devel] [Geotools-devel] Reading GeoTiffs from HDFS

Hi Simone,

Thanks for the feedback!

As quick response, for #1, I agree that using mosaicing / an image
pyramid would be a great option. I was mainly working at the prototype
phase, and I wanted to have a discussion on the mailing lists
(especially since changes are required in ImageIO-Ext or GeoTools and
GeoServer.)

For #2, I do like the idea of having a cahce in the ImageInputStream.
  From that suggestion, I take it that you'd be willing to entertain
changes to the current ImageInputStreams and the additional of some way
to cache data.

In terms of caching, do you have any suggestions? Also, I'd be
interested in any advice for how we can configure that cache and make
those options available to a GeoServer admin appropriately.

Further, at a high-level, should the goal for this work be a community
module?

Cheers,

Jim

On 04/22/2016 01:49 PM, Simone Giannecchini wrote:

Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you'll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

- has an internal cache (file and or memory) that does not get thrown
away upon each request but tends to live longer for each single file
in HDF
- we would have different streams reuse the same cache. Multiple
requests might read data from the cache concurrently but when data is
not there, we would block the thread for the request, go back to HFDS,
pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@anonymised.com> wrote:

Hi all,

I want to report on my success with registering and displaying GeoTiffs
stored on HDFS. There are some limitations with this approach;
particularly, I am unsure if there's anyway to cache / memory-map the
data. As such, I believe each request is re-downloading the entire file.

Generally, I hope to document my approach well enough so that others
could follow it (if needed) and to solicit feedback. In terms of
feedback, I'd love to hear 1) if there are improvements, and 2) if the
changes are reasonable enough to be considered for a proposal/merge request.

That out of the way, here's the rough outline:

1. Register additional URL handlers.
2. Convince validation layers in GeoServer that 'hdfs' is an ok URL scheme.
3. Get bytes out of the HDFS file.

For step 1, note that Java's URL scheme is pluggable via
java.net.URLStreamHandler. The docs(1) point out that one can call
URL.setURLStreamHandlerFactory to setup a Factory to provide such a
handler. This method can only be called once, and folks from the
internet (2) do yoga since Tomcat already registers a factory. They
seem to have missed the fact that the Tomcat factory actually lets you
add your own. I provide a gist (3) to show a little bean which will
instantiate a Hadoop URL handler and try to install it using both of
those methods.

There are two places I found in GeoServer which validate the URL given
in the page for adding a GeoTiff. The first is the GeoServer
FileExistValidator which calls out to a Wicket UrlValidator. Telling the
Wicket class to allow_all_schemes knocks out that issue. For the
second, in the FileModel, one needs to provide a happy path for URLs
which are not local to the filesystem. Those two small changes are here
(4).

Once GeoServer will register a GeoTiff coverage with a non-'file://'
URL, we need to read the bytes. Javax has an interface
javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a
string, checks if it starts with "hdfs", creates a URL, and returns new
MemoryCacheImageInputStream(url.openStream()). The only problem with
this approach is that there is already an implementation which handles
Strings, and GeoTools's ImageIOExt tries the first one and skips any
others. One can update that handling (5) slightly to try all the
handlers. It'd probably be better to update (6) to try url.openStream
as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer.
The hdfs layer was a little slower than the local filesystem layer, but
it wasn't unusable. To crank things up, I tried out a 600+ megabyte
GeoTiff from Natural Earth, and it was downright slow. Using a network
monitor, I was able to observe network traffic consistent with the
entire file being re-read for most requests. I think this approach may
be slightly useful for layers which are infrequently accessed and then
only be a few users.

Thanks to everyone who had suggestions and encouragement for the
original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

1.
http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#URL(java.lang.String,%20java.lang.String,%20int,%20java.lang.String)

2. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

3. Gist for a bean to register the Hadoop URL handlers:
https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4.
https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054ec1ec782d45
The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5.
https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

Or one could modify the URL handling here:
6.
https://github.com/geosolutions-it/imageio-ext/blob/master/library/streams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInputStreamSpi.java#L88-L97

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Le vendredi 22 avril 2016 22:06:39, Jim Hughes a écrit :

Hi Chris,

Nice! That's a fun find.

Generally, I do like the idea of using Map/Reduce or Spark to
pre-generate tiles or an image pyramid. We've kicked around the idea of
GWC + M/R a few times in passing. If one has Hadoop infrastructure
hanging around, it might make sense to use GeoTrellis, SpatialHadoop
(GeoJini), etc. for some of that processing.

Either way, being able to read the odd raster file straight from hdfs://
or s3:// and have it cached in memory seems like an amusing/useful
project. I'm hopeful we can nail down the details.

Probably a bit out of topic, but in case that might be useful, GDAL for
example can through its /vsicurl/ (and in the fresh new 2.1.0 /vsis3/) virtual
file systems read remote http files by most its drivers. Perhaps that could be
used through the imageio-ext GDAL bridge.

http://www.gdal.org/cpl__vsi_8h.html#a4f791960f2d86713d16e99e9c0c36258
http://www.gdal.org/cpl__vsi_8h.html#a5b4754999acd06444bfda172ff2aaa16
http://download.osgeo.org/gdal/workshop/foss4ge2015/workshop_gdal.html#__RefHeading__5995_1333016408

Cheers,

Jim

On 04/22/2016 02:27 PM, Chris Snider wrote:
> I did find this reference (helpful ?):
> https://github.com/openreserach/bin2seq/blob/master/src/main/java/com/ope
> nresearchinc/hadoop/sequencefile/GeoTiff.java
>
> " /@formatter:off
> /**
>
> *
> * A program to demo retrive attributes from Geotiff images as Hadoop
> SequenceFile stored on hdfs:// or s3:// *
> *
> * @author heq
> */
>
> // @formatter:on"
>
> Chris Snider
> Senior Software Engineer
> Intelligent Software Solutions, Inc.
>
>
>
> -----Original Message-----
> From: Chris Snider [mailto:chris.snider@anonymised.com]
> Sent: Friday, April 22, 2016 12:11 PM
> To: Jim Hughes <jnh5y@anonymised.com>; Simone Giannecchini
> <simone.giannecchini@anonymised.com> Cc:
> geoserver-devel@lists.sourceforge.net; GeoTools Developers list
> <geotools-devel@lists.sourceforge.net> Subject: Re: [Geoserver-devel]
> [Geotools-devel] Reading GeoTiffs from HDFS
>
> Hi,
>
> I don't know that much about HDFS, but is there something that can be
> setup like a map/reduce function directly in the HDFS servers that can
> do some of the restriction of byte level data returned? Yarn/Sparql,
> some other acronym? I assume it would have to be administrator
> responsibility to add said process to the server stack if it is even
> possible.
>
> Chris Snider
> Senior Software Engineer
> Intelligent Software Solutions, Inc.
>
>
> -----Original Message-----
> From: Jim Hughes [mailto:jnh5y@anonymised.com]
> Sent: Friday, April 22, 2016 12:05 PM
> To: Simone Giannecchini <simone.giannecchini@anonymised.com>
> Cc: geoserver-devel@lists.sourceforge.net; GeoTools Developers list
> <geotools-devel@lists.sourceforge.net> Subject: Re: [Geoserver-devel]
> [Geotools-devel] Reading GeoTiffs from HDFS
>
> Hi Simone,
>
> Thanks for the feedback!
>
> As quick response, for #1, I agree that using mosaicing / an image
> pyramid would be a great option. I was mainly working at the prototype
> phase, and I wanted to have a discussion on the mailing lists
> (especially since changes are required in ImageIO-Ext or GeoTools and
> GeoServer.)
>
> For #2, I do like the idea of having a cahce in the ImageInputStream.
>
> From that suggestion, I take it that you'd be willing to entertain
>
> changes to the current ImageInputStreams and the additional of some way
> to cache data.
>
> In terms of caching, do you have any suggestions? Also, I'd be
> interested in any advice for how we can configure that cache and make
> those options available to a GeoServer admin appropriately.
>
> Further, at a high-level, should the goal for this work be a community
> module?
>
> Cheers,
>
> Jim
>
> On 04/22/2016 01:49 PM, Simone Giannecchini wrote:
>> Dear Jim,
>> quick feedback.
>>
>> First of all congratulation on making this work. As I suspected the
>> bottleneck is getting the data out of HDFS.
>> I can think about two things (which we are not mutually exclusive):
>>
>> -1- Maybe complex, put smaller bits into HFDS and use the mosaic to
>> serve or even develop a light(er)weight layer that can pull the
>> granules.
>>
>> This would help with WMS requests over large files as you'll end up
>> use smaller chunks to satisfy them most of the time
>>
>> -2- We could build a more complex ImageInputStream that:
>>
>> - has an internal cache (file and or memory) that does not get thrown
>> away upon each request but tends to live longer for each single file
>> in HDF
>> - we would have different streams reuse the same cache. Multiple
>> requests might read data from the cache concurrently but when data is
>> not there, we would block the thread for the request, go back to HFDS,
>> pull the data, write to the cache and so on
>>
>> We could put together 1 and 2 to make things faster.
>>
>> Hope this helps, anyway, I am in favour of exploring this in order to
>> allow the GeoServer stack to support data from HDFS.
>>
>> Regards,
>> Simone Giannecchini
>> ==
>> GeoServer Professional Services from the experts!
>> Visit GeoSolutions Enterprise Support Services for more information.
>> ==
>> Ing. Simone Giannecchini
>> @simogeo
>> Founder/Director
>>
>> GeoSolutions S.A.S.
>> Via di Montramito 3/A
>> 55054 Massarosa (LU)
>> Italy
>> phone: +39 0584 962313
>> fax: +39 0584 1660272
>> mob: +39 333 8128928
>>
>> http://www.geo-solutions.it
>> x.com
>>
>> -------------------------------------------------------
>> AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
>> Le informazioni contenute in questo messaggio di posta elettronica e/o
>> nel/i file/s allegato/i sono da considerarsi strettamente riservate.
>> Il loro utilizzo è consentito esclusivamente al destinatario del
>> messaggio, per le finalità indicate nel messaggio stesso. Qualora
>> riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
>> cortesemente di darcene notizia via e-mail e di procedere alla
>> distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
>> Conservare il messaggio stesso, divulgarlo anche in parte,
>> distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
>> diverse, costituisce comportamento contrario ai principi dettati dal
>> D.Lgs. 196/2003.
>>
>> The information in this message and/or attachments, is intended solely
>> for the attention and use of the named addressee(s) and may be
>> confidential or proprietary in nature or covered by the provisions of
>> privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
>> Data Protection Code).Any use not in accord with its purpose, any
>> disclosure, reproduction, copying, distribution, or either
>> dissemination, either whole or partial, is strictly forbidden except
>> previous formal approval of the named addressee(s). If you are not the
>> intended recipient, please contact immediately the sender by
>> telephone, fax or e-mail and delete the information in this message
>> that has been received in error. The sender does not give any warranty
>> or accept liability as the content, accuracy or completeness of sent
>> messages and accepts no responsibility for changes made after they
>> were sent or for other risks which arise as a result of e-mail
>> transmission, viruses, etc.
>>
>> On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@anonymised.com> wrote:
>>> Hi all,
>>>
>>> I want to report on my success with registering and displaying GeoTiffs
>>> stored on HDFS. There are some limitations with this approach;
>>> particularly, I am unsure if there's anyway to cache / memory-map the
>>> data. As such, I believe each request is re-downloading the entire
>>> file.
>>>
>>> Generally, I hope to document my approach well enough so that others
>>> could follow it (if needed) and to solicit feedback. In terms of
>>> feedback, I'd love to hear 1) if there are improvements, and 2) if the
>>> changes are reasonable enough to be considered for a proposal/merge
>>> request.
>>>
>>> That out of the way, here's the rough outline:
>>>
>>> 1. Register additional URL handlers.
>>> 2. Convince validation layers in GeoServer that 'hdfs' is an ok URL
>>> scheme. 3. Get bytes out of the HDFS file.
>>>
>>> For step 1, note that Java's URL scheme is pluggable via
>>> java.net.URLStreamHandler. The docs(1) point out that one can call
>>> URL.setURLStreamHandlerFactory to setup a Factory to provide such a
>>> handler. This method can only be called once, and folks from the
>>> internet (2) do yoga since Tomcat already registers a factory. They
>>> seem to have missed the fact that the Tomcat factory actually lets you
>>> add your own. I provide a gist (3) to show a little bean which will
>>> instantiate a Hadoop URL handler and try to install it using both of
>>> those methods.
>>>
>>> There are two places I found in GeoServer which validate the URL given
>>> in the page for adding a GeoTiff. The first is the GeoServer
>>> FileExistValidator which calls out to a Wicket UrlValidator. Telling
>>> the Wicket class to allow_all_schemes knocks out that issue. For the
>>> second, in the FileModel, one needs to provide a happy path for URLs
>>> which are not local to the filesystem. Those two small changes are
>>> here (4).
>>>
>>> Once GeoServer will register a GeoTiff coverage with a non-'file://'
>>> URL, we need to read the bytes. Javax has an interface
>>> javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
>>> a particular class and an ImageInputStream.
>>>
>>> For my prototype, I wrote an instance of this interface which takes a
>>> string, checks if it starts with "hdfs", creates a URL, and returns new
>>> MemoryCacheImageInputStream(url.openStream()). The only problem with
>>> this approach is that there is already an implementation which handles
>>> Strings, and GeoTools's ImageIOExt tries the first one and skips any
>>> others. One can update that handling (5) slightly to try all the
>>> handlers. It'd probably be better to update (6) to try url.openStream
>>> as a fallback.
>>>
>>> During testing, I worked with the sfdem.tif which ships with GeoServer.
>>> The hdfs layer was a little slower than the local filesystem layer, but
>>> it wasn't unusable. To crank things up, I tried out a 600+ megabyte
>>> GeoTiff from Natural Earth, and it was downright slow. Using a network
>>> monitor, I was able to observe network traffic consistent with the
>>> entire file being re-read for most requests. I think this approach may
>>> be slightly useful for layers which are infrequently accessed and then
>>> only be a few users.
>>>
>>> Thanks to everyone who had suggestions and encouragement for the
>>> original thread!
>>>
>>> Cheers,
>>>
>>> Jim
>>>
>>> Step 1: Register additional URL handlers:
>>>
>>> 1.
>>> http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#U
>>> RL%28java.lang.String,%20java.lang.String,%20int,%20java.lang.String%29
>>>
>>> 2. Java URL Handlers
>>>
>>> 3. Gist for a bean to register the Hadoop URL handlers:
>>> Bean to register Hadoop URL StreamHandlerFactory · GitHub
>>>
>>> Step 2: GeoServer changes:
>>> 4.
>>> Changes to wire up HDFS URLs. · jnh5y/geoserver@5320f26 · GitHub
>>> ec1ec782d45 The FileModel change could be a little more robust.
>>>
>>> Step 3: GeoTools changes:
>>> 5.
>>> Updates to support HDFS URLs. · jnh5y/geotools@f2db293 · GitHub
>>> bc1abb6f49
>>>
>>> Or one could modify the URL handling here:
>>> 6.
>>> https://github.com/geosolutions-it/imageio-ext/blob/master/library/stre
>>> ams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInpu
>>> tStreamSpi.java#L88-L97
>>>
>>>
>>>
>>>
>>> -----------------------------------------------------------------------
>>> ------- Find and fix application performance issues faster with
>>> Applications Manager Applications Manager provides deep performance
>>> insights into multiple tiers of your business applications. It
>>> resolves application problems quickly and reduces your MTTR. Get your
>>> free trial!
>>> Application Monitoring Software and Tool - ManageEngine Applications Manager
>>> _______________________________________________
>>> GeoTools-Devel mailing list
>>> GeoTools-Devel@lists.sourceforge.net
>>> geotools-devel List Signup and Options
>
> -------------------------------------------------------------------------
> ----- Find and fix application performance issues faster with
> Applications Manager Applications Manager provides deep performance
> insights into multiple tiers of your business applications. It resolves
> application problems quickly and reduces your MTTR. Get your free trial!
> Application Monitoring Software and Tool - ManageEngine Applications Manager
> _______________________________________________
> Geoserver-devel mailing list
> Geoserver-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geoserver-devel
> -------------------------------------------------------------------------
> ----- Find and fix application performance issues faster with
> Applications Manager Applications Manager provides deep performance
> insights into multiple tiers of your business applications. It resolves
> application problems quickly and reduces your MTTR. Get your free trial!
> Application Monitoring Software and Tool - ManageEngine Applications Manager
> _______________________________________________
> Geoserver-devel mailing list
> Geoserver-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geoserver-devel

---------------------------------------------------------------------------
--- Find and fix application performance issues faster with Applications
Manager Applications Manager provides deep performance insights into
multiple tiers of your business applications. It resolves application
problems quickly and reduces your MTTR. Get your free trial!
Application Monitoring Software and Tool - ManageEngine Applications Manager
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Spatialys - Geospatial professional services
http://www.spatialys.com

Hey everyone,

Chatted with Jim about this a couple weeks ago and I wanted to revisit it, since we’d like to do something similar except with S3 instead of Hadoop, although many of the changes would be very similar.

I’m interested in whether anyone has any objections to some of these changes. In particular the changes to GeoServer to change the File validation to allow URLs of any protocol and especially this change to the code which searches for an appropriate ImageInputStreamSPI as detailed here:

https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195babc1abb6f49

IMO it’s a pretty sensible change, I think we do similar things elsewhere (catching exceptions from SPIs that we try and moving on even if they throw an exception.

Thoughts anyone? Any reason not to do this? I can do a PR pretty quickly for it.

Cheers

···

On Fri, Apr 22, 2016 at 1:41 PM, Even Rouault <even.rouault@anonymised.com> wrote:

Le vendredi 22 avril 2016 22:06:39, Jim Hughes a écrit :

Hi Chris,

Nice! That’s a fun find.

Generally, I do like the idea of using Map/Reduce or Spark to
pre-generate tiles or an image pyramid. We’ve kicked around the idea of
GWC + M/R a few times in passing. If one has Hadoop infrastructure
hanging around, it might make sense to use GeoTrellis, SpatialHadoop
(GeoJini), etc. for some of that processing.

Either way, being able to read the odd raster file straight from hdfs://
or s3:// and have it cached in memory seems like an amusing/useful
project. I’m hopeful we can nail down the details.

Probably a bit out of topic, but in case that might be useful, GDAL for
example can through its /vsicurl/ (and in the fresh new 2.1.0 /vsis3/) virtual
file systems read remote http files by most its drivers. Perhaps that could be
used through the imageio-ext GDAL bridge.

http://www.gdal.org/cpl__vsi_8h.html#a4f791960f2d86713d16e99e9c0c36258
http://www.gdal.org/cpl__vsi_8h.html#a5b4754999acd06444bfda172ff2aaa16
http://download.osgeo.org/gdal/workshop/foss4ge2015/workshop_gdal.html#__RefHeading__5995_1333016408
https://sgillies.net/2016/04/05/rasterio-0-34.html

Cheers,

Jim

On 04/22/2016 02:27 PM, Chris Snider wrote:

I did find this reference (helpful ?):
https://github.com/openreserach/bin2seq/blob/master/src/main/java/com/ope
nresearchinc/hadoop/sequencefile/GeoTiff.java

" /@formatter:off
/**

  • A program to demo retrive attributes from Geotiff images as Hadoop
    SequenceFile stored on hdfs:// or s3:// *
  • @author heq
    */

// @formatter:on"

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Chris Snider [mailto:chris.snider@anonymised.comm]
Sent: Friday, April 22, 2016 12:11 PM
To: Jim Hughes <jnh5y@anonymised.com>; Simone Giannecchini
<simone.giannecchini@anonymised.comions.it> Cc:
geoserver-devel@anonymised.comrge.net; GeoTools Developers list
<geotools-devel@anonymised.comrge.net> Subject: Re: [Geoserver-devel]
[Geotools-devel] Reading GeoTiffs from HDFS

Hi,

I don’t know that much about HDFS, but is there something that can be
setup like a map/reduce function directly in the HDFS servers that can
do some of the restriction of byte level data returned? Yarn/Sparql,
some other acronym? I assume it would have to be administrator
responsibility to add said process to the server stack if it is even
possible.

Chris Snider
Senior Software Engineer
Intelligent Software Solutions, Inc.

-----Original Message-----
From: Jim Hughes [mailto:jnh5y@anonymised.com]
Sent: Friday, April 22, 2016 12:05 PM
To: Simone Giannecchini <simone.giannecchini@anonymised.comions.it>
Cc: geoserver-devel@anonymised.comrge.net; GeoTools Developers list
<geotools-devel@anonymised.comrge.net> Subject: Re: [Geoserver-devel]
[Geotools-devel] Reading GeoTiffs from HDFS

Hi Simone,

Thanks for the feedback!

As quick response, for #1, I agree that using mosaicing / an image
pyramid would be a great option. I was mainly working at the prototype
phase, and I wanted to have a discussion on the mailing lists
(especially since changes are required in ImageIO-Ext or GeoTools and
GeoServer.)

For #2, I do like the idea of having a cahce in the ImageInputStream.

From that suggestion, I take it that you’d be willing to entertain

changes to the current ImageInputStreams and the additional of some way
to cache data.

In terms of caching, do you have any suggestions? Also, I’d be
interested in any advice for how we can configure that cache and make
those options available to a GeoServer admin appropriately.

Further, at a high-level, should the goal for this work be a community
module?

Cheers,

Jim

On 04/22/2016 01:49 PM, Simone Giannecchini wrote:

Dear Jim,
quick feedback.

First of all congratulation on making this work. As I suspected the
bottleneck is getting the data out of HDFS.
I can think about two things (which we are not mutually exclusive):

-1- Maybe complex, put smaller bits into HFDS and use the mosaic to
serve or even develop a light(er)weight layer that can pull the
granules.

This would help with WMS requests over large files as you’ll end up
use smaller chunks to satisfy them most of the time

-2- We could build a more complex ImageInputStream that:

  • has an internal cache (file and or memory) that does not get thrown
    away upon each request but tends to live longer for each single file
    in HDF
  • we would have different streams reuse the same cache. Multiple
    requests might read data from the cache concurrently but when data is
    not there, we would block the thread for the request, go back to HFDS,
    pull the data, write to the cache and so on

We could put together 1 and 2 to make things faster.

Hope this helps, anyway, I am in favour of exploring this in order to
allow the GeoServer stack to support data from HDFS.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Sun, Apr 17, 2016 at 9:49 PM, Jim Hughes <jnh5y@anonymised.com> wrote:

Hi all,

I want to report on my success with registering and displaying GeoTiffs
stored on HDFS. There are some limitations with this approach;
particularly, I am unsure if there’s anyway to cache / memory-map the
data. As such, I believe each request is re-downloading the entire
file.

Generally, I hope to document my approach well enough so that others
could follow it (if needed) and to solicit feedback. In terms of
feedback, I’d love to hear 1) if there are improvements, and 2) if the
changes are reasonable enough to be considered for a proposal/merge
request.

That out of the way, here’s the rough outline:

  1. Register additional URL handlers.
  2. Convince validation layers in GeoServer that ‘hdfs’ is an ok URL
    scheme. 3. Get bytes out of the HDFS file.

For step 1, note that Java’s URL scheme is pluggable via
java.net.URLStreamHandler. The docs(1) point out that one can call
URL.setURLStreamHandlerFactory to setup a Factory to provide such a
handler. This method can only be called once, and folks from the
internet (2) do yoga since Tomcat already registers a factory. They
seem to have missed the fact that the Tomcat factory actually lets you
add your own. I provide a gist (3) to show a little bean which will
instantiate a Hadoop URL handler and try to install it using both of
those methods.

There are two places I found in GeoServer which validate the URL given
in the page for adding a GeoTiff. The first is the GeoServer
FileExistValidator which calls out to a Wicket UrlValidator. Telling
the Wicket class to allow_all_schemes knocks out that issue. For the
second, in the FileModel, one needs to provide a happy path for URLs
which are not local to the filesystem. Those two small changes are
here (4).

Once GeoServer will register a GeoTiff coverage with a non-‘file://’
URL, we need to read the bytes. Javax has an interface
javax.imageio.spi.ImageInputStreamSpi which adapts between instances of
a particular class and an ImageInputStream.

For my prototype, I wrote an instance of this interface which takes a
string, checks if it starts with “hdfs”, creates a URL, and returns new
MemoryCacheImageInputStream(url.openStream()). The only problem with
this approach is that there is already an implementation which handles
Strings, and GeoTools’s ImageIOExt tries the first one and skips any
others. One can update that handling (5) slightly to try all the
handlers. It’d probably be better to update (6) to try url.openStream
as a fallback.

During testing, I worked with the sfdem.tif which ships with GeoServer.
The hdfs layer was a little slower than the local filesystem layer, but
it wasn’t unusable. To crank things up, I tried out a 600+ megabyte
GeoTiff from Natural Earth, and it was downright slow. Using a network
monitor, I was able to observe network traffic consistent with the
entire file being re-read for most requests. I think this approach may
be slightly useful for layers which are infrequently accessed and then
only be a few users.

Thanks to everyone who had suggestions and encouragement for the
original thread!

Cheers,

Jim

Step 1: Register additional URL handlers:

http://download.java.net/jdk7/archive/b123/docs/api/java/net/URL.html#U
RL%28java.lang.String,%20java.lang.String,%20int,%20java.lang.String%29

  1. http://skife.org/java/url/library/2012/05/14/java_url_handlers.html

  2. Gist for a bean to register the Hadoop URL handlers:
    https://gist.github.com/jnh5y/1739baa42466d66e383fa26ffd7235ca

Step 2: GeoServer changes:
4.
https://github.com/jnh5y/geoserver/commit/5320f26a0574f034433aa96097054

ec1ec782d45 The FileModel change could be a little more robust.

Step 3: GeoTools changes:
5.
https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195ba
bc1abb6f49

Or one could modify the URL handling here:
6.
https://github.com/geosolutions-it/imageio-ext/blob/master/library/stre
ams/src/main/java/it/geosolutions/imageio/stream/input/spi/URLImageInpu
tStreamSpi.java#L88-L97


------- Find and fix application performance issues faster with
Applications Manager Applications Manager provides deep performance
insights into multiple tiers of your business applications. It
resolves application problems quickly and reduces your MTTR. Get your
free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z


GeoTools-Devel mailing list
GeoTools-Devel@anonymised.comge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel


----- Find and fix application performance issues faster with
Applications Manager Applications Manager provides deep performance
insights into multiple tiers of your business applications. It resolves
application problems quickly and reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z


Geoserver-devel mailing list
Geoserver-devel@anonymised.comrge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

----- Find and fix application performance issues faster with
Applications Manager Applications Manager provides deep performance
insights into multiple tiers of your business applications. It resolves
application problems quickly and reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z


Geoserver-devel mailing list
Geoserver-devel@anonymised.comrge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


— Find and fix application performance issues faster with Applications
Manager Applications Manager provides deep performance insights into
multiple tiers of your business applications. It resolves application
problems quickly and reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z


Geoserver-devel mailing list
Geoserver-devel@anonymised.comrge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Spatialys - Geospatial professional services
http://www.spatialys.com


Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z


Geoserver-devel mailing list
Geoserver-devel@anonymised.comrge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Tue, Nov 29, 2016 at 7:39 PM, Devon Tucker <devonrtucker@anonymised.com>
wrote:

Hey everyone,

Chatted with Jim about this a couple weeks ago and I wanted to revisit it,
since we'd like to do something similar except with S3 instead of Hadoop,
although many of the changes would be very similar.

I'm interested in whether anyone has any objections to some of these
changes. In particular the changes to GeoServer to change the File
validation to allow URLs of any protocol and especially this change to the
code which searches for an appropriate ImageInputStreamSPI as detailed here:

https://github.com/jnh5y/geotools/commit/f2db29339c7f7e43d0c52ab93195ba
bc1abb6f49

IMO it's a pretty sensible change, I think we do similar things elsewhere
(catching exceptions from SPIs that we try and moving on even if they throw
an exception.

Thoughts anyone? Any reason not to do this? I can do a PR pretty quickly
for it.

I don't have a clear view of the consequences, but I'm indeed skeptical, as
the check is done in a utility class that's used in
places other than the GeoTiff case you're focusing on.

I would see better an approach that registers a ImageInputStreamSpi that
knows how to deal with a certain protocol instead,
... possibly placing it at a lower priority to make sure the ones optimized
for random access files are tried first.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------