[Geoserver-devel] Postgresql Datastore for raster

Dear list,

For a project that I will work on, with Martin Desruisseaux, I have to get grid coverages from a postgresql database.

I want to integrate the WCS specifications in a web service that will handle the connecting part to the database via a specific datastore, and display a grid coverage in response of what user has specified.
But in Geoserver, there is at this moment no datastore to manage this connection to a postgresql database for rasters.
So we hesitate to use the GridCoverageExchange interface to do that, because Martin told me that these interface is a little deprecated and needs a refactoring.

Finally my question is : Should I implement the GridCoverageExchange interface, or should I implement the DataStore interface for a WCS?

Cheers,
Cédric B.

Ciao Cedric,
what you are trying to do is actually quite interesting. I have been
thinking about doing something like it for a while now but I have
never had enough time do it.

Here you have my thoughts.

<Rasters in postgresql-postgis>
I am not sure postigs-postgresql is the best solutions for doing this,
at least as far as performance is a concern. well to be honest, I am
not really sure that using a db is a good solution in general when
performance is involved (is that true Andrea?).

Anyway, I think it is worth to give it a try in order to test with
small and big coverages to see how it goes.

<GridCoverageExchange and the right approach>
It is true that the actual interfaces behind the plugins are
deprecated and will be removed, so I understand your concerns with
using them.

I have to things to say:
1>You could give a hand with designing the new ones (I always ask, you
never know... :slight_smile: )

2>I can suggest an approach that could be easily reused if in the
future things change.

Let me introduce what this approach is about for a bit.

You know that the a gridcoverage2d is more or less a wrapper around a
PlanarImage(actually a RenderedImage, but let me o that way). Well the
right approach to feed data into a PlanarImage using deferred loading
(multithreaded asynchronous loading), deferred execution
(multithreaded deferred operations execution) and tiling is to create
the coverage using the JAI ImageRead operation which is basically the
bridge between JAI (processing) and ImageIO (data i/o).

That said, my suggestion is doing much part of the work at the ImageIO
level. This would involve translationg tiles into SQL queries and then
BLOB and you will get for free multithreading and all the other good
things I said above.
Then you wrap all this inside a GCE plugin and the job is done. This
way you decouple the cmplexity of IO from the management of the
geolocation.

To give some thoughts I attached a very simple architectural design..

In case you want to have an idea about how write and I/O plugin for
Image there is one which is about to be released the allows to read
esri ascii grids (ask Martin, I showed him this last week a FOSS) at
incredible speed, using pure Java! it is here
http://svn.geotools.org/geotools/trunk/spike/imageio/asciigrid/ and
the related geotools plugin is here
http://svn.geotools.org/geotools/trunk/spike/arcGrid/. If you check
the code, you will notice that basically the geotools plugin is really
thin while the magic happens inside the imageio plugin, tiling,
multithreading, metadata management, etc....

I am interested in giving a hand on this, share some ideas, get
involved somehow. Let me know what your plans are, we could even set
up a breakout IRC to discuss this with martin and debate over our
respective ideas. I am pretty sure we will find a way implement this
plugin with not to many worries.

Simone.

On 9/22/06, cédric briançon <cedric.briancon@anonymised.com> wrote:

Dear list,

For a project that I will work on, with Martin Desruisseaux, I have to
get grid coverages from a postgresql database.

I want to integrate the WCS specifications in a web service that will
handle the connecting part to the database via a specific datastore, and
display a grid coverage in response of what user has specified.
But in Geoserver, there is at this moment no datastore to manage this
connection to a postgresql database for rasters.
So we hesitate to use the GridCoverageExchange interface to do that,
because Martin told me that these interface is a little deprecated and
needs a refactoring.

Finally my question is : Should I implement the GridCoverageExchange
interface, or should I implement the DataStore interface for a WCS?

Cheers,
Cédric B.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
-------------------------------------------------------
Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it

-------------------------------------------------------

(attachments)

Presentation1.png

Ciao Simone,

In our case, postgresql doesn’t store images in blobs. As Sandro explained last week during “Future directions of PostGIS” présentation, there is a lot of overhead with that.
The method we’re using consist in storing GridCoverage metadata in the database and images in the filesystem (this is an other way that Sandro wanted to investigate).
During FOSS4G we’ve got a discussion with Sandro, and he will probablely have a look at the code in a near future.

Cheers

Vincent

Le 22 sept. 06 à 22:44, Simone Giannecchini a écrit :

Ciao Cedric,
what you are trying to do is actually quite interesting. I have been
thinking about doing something like it for a while now but I have
never had enough time do it.

Here you have my thoughts.

I am not sure postigs-postgresql is the best solutions for doing this, at least as far as performance is a concern. well to be honest, I am not really sure that using a db is a good solution in general when performance is involved (is that true Andrea?).

Anyway, I think it is worth to give it a try in order to test with
small and big coverages to see how it goes.

It is true that the actual interfaces behind the plugins are deprecated and will be removed, so I understand your concerns with using them.

I have to things to say:
1>You could give a hand with designing the new ones (I always ask, you
never know… :slight_smile: )

2>I can suggest an approach that could be easily reused if in the
future things change.

Let me introduce what this approach is about for a bit.

You know that the a gridcoverage2d is more or less a wrapper around a
PlanarImage(actually a RenderedImage, but let me o that way). Well the
right approach to feed data into a PlanarImage using deferred loading
(multithreaded asynchronous loading), deferred execution
(multithreaded deferred operations execution) and tiling is to create
the coverage using the JAI ImageRead operation which is basically the
bridge between JAI (processing) and ImageIO (data i/o).

That said, my suggestion is doing much part of the work at the ImageIO
level. This would involve translationg tiles into SQL queries and then
BLOB and you will get for free multithreading and all the other good
things I said above.
Then you wrap all this inside a GCE plugin and the job is done. This
way you decouple the cmplexity of IO from the management of the
geolocation.

To give some thoughts I attached a very simple architectural design…

In case you want to have an idea about how write and I/O plugin for
Image there is one which is about to be released the allows to read
esri ascii grids (ask Martin, I showed him this last week a FOSS) at
incredible speed, using pure Java! it is here
http://svn.geotools.org/geotools/trunk/spike/imageio/asciigrid/ and
the related geotools plugin is here
http://svn.geotools.org/geotools/trunk/spike/arcGrid/. If you check
the code, you will notice that basically the geotools plugin is really
thin while the magic happens inside the imageio plugin, tiling,
multithreading, metadata management, etc…

I am interested in giving a hand on this, share some ideas, get
involved somehow. Let me know what your plans are, we could even set
up a breakout IRC to discuss this with martin and debate over our
respective ideas. I am pretty sure we will find a way implement this
plugin with not to many worries.

Simone.

On 9/22/06, cédric briançon <cedric.briancon@anonymised.com31…> wrote:

Dear list,

For a project that I will work on, with Martin Desruisseaux, I have to
get grid coverages from a postgresql database.

I want to integrate the WCS specifications in a web service that will
handle the connecting part to the database via a specific datastore, and
display a grid coverage in response of what user has specified.
But in Geoserver, there is at this moment no datastore to manage this
connection to a postgresql database for rasters.
So we hesitate to use the GridCoverageExchange interface to do that,
because Martin told me that these interface is a little deprecated and
needs a refactoring.

Finally my question is : Should I implement the GridCoverageExchange
interface, or should I implement the DataStore interface for a WCS?

Cheers,
Cédric B.


Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net’s Techsay panel and you’ll get the chance to share your
opinions on IT & business topics through brief surveys – and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV


Geoserver-devel mailing list
Geoserver-devel@anonymised.comnet
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it


<Presentation1.png>

Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net’s Techsay panel and you’ll get the chance to share your
opinions on IT & business topics through brief surveys – and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@anonymised.comnet
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Ciao Andrea,

I'm looking at all the code that Martin has done around postgresql raster storage, it doesn't only focus on the database part, he's done a lot of Java stuff on top of it.
I don't know if Martin have subscribed to the list, then I forward him thi mail, he will be more able to talk about it.

Cheers
Vincent
Le 23 sept. 06 à 13:14, Andrea Aime a écrit :

Vincent Heurteaux ha scritto:

Ciao Simone,
In our case, postgresql doesn't store images in blobs. As Sandro explained last week during "Future directions of PostGIS <http://www.foss4g2006.org/contributionDisplay.py?contribId=109&sessionId=42&confId=1&gt;&quot; présentation, there is a lot of overhead with that. The method we're using consist in storing GridCoverage metadata in the database and images in the filesystem (this is an other way that Sandro wanted to investigate).

I agree this is basically the way to go if you want good performance, and
it's also relatively easy to implement if your application is the only one using
that data (so you keep the raster on the same filesystem as you app, and use
postgis as an indexed metadata store).

At same time it would be nice to have the database handle the details and provide
access to the images thru the standard jdbc channels, so that you can share the
same data with other applications. That is, have the database server store stuff on
its filesystem, instead of using normal table storage, and allow people to access
those the same way as blobs, or whatever other mean that does not require too much
overhead (I don't know, for example as something out of the jdbc standarard
that only postgresql supports, in order to have minimal overhead).

Cheers
Andrea Aime

PS: this post won't probably be seen on the ml because my smtp is blacklisted,
can anyone of you resend this to the ml acting as a rounter?

Ciao guys,
I think this approach of storing metadata inside a db and is one of
the best ones. I hope to hear a bit more from Martin, to be honest it
would be really cool to have a look at the code he wrote because that
could be a great starting point (shame it has not been shared!).

At FOSS I have heard from andrea about this talk from Sandro Santilli
but I unfortunately missed it. Is there a way to find some slides or
docs about it?

Anyway, I hope to hear something from Martin soon since I already
started to design a abse for the next generation of interfaces and one
of the problems to face is to come up with at least a standard set of
metadata as well as a mean to store them. I hope on this new feature
we can have an approach a bit more cooperative than in the past.

Simone.

On 9/23/06, Vincent Heurteaux <vincent.heurteaux@anonymised.com> wrote:

Ciao Andrea,

I'm looking at all the code that Martin has done around postgresql
raster storage, it doesn't only focus on the database part, he's done
a lot of Java stuff on top of it.
I don't know if Martin have subscribed to the list, then I forward
him thi mail, he will be more able to talk about it.

Cheers
Vincent
Le 23 sept. 06 à 13:14, Andrea Aime a écrit :

> Vincent Heurteaux ha scritto:
>> Ciao Simone,
>> In our case, postgresql doesn't store images in blobs. As Sandro
>> explained last week during "Future directions of PostGIS <http://
>> www.foss4g2006.org/contributionDisplay.py?
>> contribId=109&sessionId=42&confId=1>" présentation, there is a lot
>> of overhead with that. The method we're using consist in storing
>> GridCoverage metadata in the database and images in the filesystem
>> (this is an other way that Sandro wanted to investigate).
>
> I agree this is basically the way to go if you want good
> performance, and
> it's also relatively easy to implement if your application is the
> only one using
> that data (so you keep the raster on the same filesystem as you
> app, and use
> postgis as an indexed metadata store).
>
> At same time it would be nice to have the database handle the
> details and provide
> access to the images thru the standard jdbc channels, so that you
> can share the
> same data with other applications. That is, have the database
> server store stuff on
> its filesystem, instead of using normal table storage, and allow
> people to access
> those the same way as blobs, or whatever other mean that does not
> require too much
> overhead (I don't know, for example as something out of the jdbc
> standarard
> that only postgresql supports, in order to have minimal overhead).
>
> Cheers
> Andrea Aime
>
> PS: this post won't probably be seen on the ml because my smtp is
> blacklisted,
> can anyone of you resend this to the ml acting as a rounter?
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
-------------------------------------------------------
Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it

-------------------------------------------------------

Simone Giannecchini a écrit :

I think this approach of storing metadata inside a db and is one of
the best ones. I hope to hear a bit more from Martin, to be honest it
would be really cool to have a look at the code he wrote because that
could be a great starting point (shame it has not been shared!).

Actually the code is on SourceForge:

     http://seagis.sourceforge.net/observations

But it is documented in French for now. It is called "observations" because it is actually two things: a raster database and a set of observations (fishery data in my case) to correlate statistically to pixel values. It was created for an oceanographic study.

We are using this raster database for 4 years now, and it work quite well. This is why I never feel a personal pressure for a GridCoverageExchange implementation - I was ignoring it and using the above project for loading images instead.

I would like to post on the Geotools mailing list (or maybe on the wiki) an overview of this database schema. I have been silent for now because I would like first to hold my promise of a "java logging" - "log4j" bridge, and a fix of the factory stuff. I will try to write an explanation of this database after that, hopefully next week.

  Regards,

    Martin

Ciao,

At FOSS I have heard from andrea about this talk from Sandro Santilli
but I unfortunately missed it. Is there a way to find some slides or
docs about it?

Here it is :

http://www.foss4g2006.org/contributionDisplay.py?contribId=109&sessionId=42&confId=1

Cheers

Vincent

Thanks Vincent,
I found that link myself, I was hoping you had access to something
like ppt slides or the like.

Simone.

On 9/23/06, Vincent Heurteaux <vincent.heurteaux@anonymised.com> wrote:

Ciao,

> At FOSS I have heard from andrea about this talk from Sandro Santilli
> but I unfortunately missed it. Is there a way to find some slides or
> docs about it?

Here it is :

http://www.foss4g2006.org/contributionDisplay.py?
contribId=109&sessionId=42&confId=1

Cheers

Vincent

--
-------------------------------------------------------
Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it

-------------------------------------------------------

Thanks for the link Martin, I gave a very quick look at it and a
couple fo questions arised:

1>What kind of rasters do you usually store in terms of number of
dimensions (2d,3d, time..), size (mb, hundreds of mb, gb?), number of
bands, type of the source (remote sensing, models), data types,
etc....

2>Since you are very much interested in observations (if I am right of
course) did you have a look at what the Sensor Web Enablemetn Group is
doin in OGC? If you had it, what is your impression?

3>Why the heck you did not publicize more this raster+db thing? :slight_smile:

4>I am quite excited about reusing this work of yours somehow by
hopefully combining with the work I have been carrying over with
ImageIO, JAI and (J)GDAL. I think I am going to bother you a lot next
week trying to integrated all these approaches into one and possibly
putting all this behind some new and more stable interfaces, as we
planned during our quick talk at FOSS. Please send as much as you can
to the list, I do not mind if it is in french, we could try to
transalte part of it.

Ciao,
Simone.

On 9/23/06, Martin Desruisseaux <martin.desruisseaux@anonymised.com> wrote:

Simone Giannecchini a écrit :
> I think this approach of storing metadata inside a db and is one of
> the best ones. I hope to hear a bit more from Martin, to be honest it
> would be really cool to have a look at the code he wrote because that
> could be a great starting point (shame it has not been shared!).

Actually the code is on SourceForge:

    http://seagis.sourceforge.net/observations

But it is documented in French for now. It is called "observations" because it
is actually two things: a raster database and a set of observations (fishery
data in my case) to correlate statistically to pixel values. It was created for
an oceanographic study.

We are using this raster database for 4 years now, and it work quite well. This
is why I never feel a personal pressure for a GridCoverageExchange
implementation - I was ignoring it and using the above project for loading
images instead.

I would like to post on the Geotools mailing list (or maybe on the wiki) an
overview of this database schema. I have been silent for now because I would
like first to hold my promise of a "java logging" - "log4j" bridge, and a fix of
the factory stuff. I will try to write an explanation of this database after
that, hopefully next week.

       Regards,

               Martin

--
-------------------------------------------------------
Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it

-------------------------------------------------------

Sorry Simone I did’nt verify if the link.

You can found the slides here :
http://foo.keybit.net/~strk/tmp/presentation.sxi

Vincent Heurteaux
Société Geomatys
mob. 06 23 26 77 94
vincent.heurteaux@anonymised.com
http://www.geomatys.fr

Le 23 sept. 06 à 19:25, Simone Giannecchini a écrit :

Thanks Vincent,
I found that link myself, I was hoping you had access to something
like ppt slides or the like.

Simone.

On 9/23/06, Vincent Heurteaux <vincent.heurteaux@anonymised.com931…> wrote:

Ciao,

At FOSS I have heard from andrea about this talk from Sandro Santilli
but I unfortunately missed it. Is there a way to find some slides or
docs about it?

Here it is :

http://www.foss4g2006.org/contributionDisplay.py?
contribId=109&sessionId=42&confId=1

Cheers

Vincent

Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it


Thanks a lot Vincent,
tomorrow I will take a look at them.

Ciao,
Simone.

On 9/23/06, Vincent Heurteaux <vincent.heurteaux@anonymised.com> wrote:

Sorry Simone I did'nt verify if the link.

You can found the slides here :
http://foo.keybit.net/~strk/tmp/presentation.sxi

Vincent Heurteaux
Société Geomatys
mob. 06 23 26 77 94
vincent.heurteaux@anonymised.com
http://www.geomatys.fr

Le 23 sept. 06 à 19:25, Simone Giannecchini a écrit :

Thanks Vincent,
I found that link myself, I was hoping you had access to something
like ppt slides or the like.

Simone.

On 9/23/06, Vincent Heurteaux <vincent.heurteaux@anonymised.com> wrote:
Ciao,

> At FOSS I have heard from andrea about this talk from Sandro Santilli
> but I unfortunately missed it. Is there a way to find some slides or
> docs about it?

Here it is :

http://www.foss4g2006.org/contributionDisplay.py?
contribId=109&sessionId=42&confId=1

Cheers

Vincent

--
-------------------------------------------------------
Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it

-------------------------------------------------------

--
-------------------------------------------------------
Eng. Simone Giannecchini
President /CEO GeoSolutions

http://www.geo-solutions.it

-------------------------------------------------------

2>Since you are very much interested in observations (if I am right of
course) did you have a look at what the Sensor Web Enablemetn Group is
doin in OGC? If you had it, what is your impression?

This task is in my ToDo list.

I've actually not found the time to have a serious look on it (we've our Web site to finish at first), but this will be my main job in a couple of week.

cheers

Hello Simone

Simone Giannecchini a écrit :

1>What kind of rasters do you usually store in terms of number of
dimensions (2d,3d, time..), size (mb, hundreds of mb, gb?), number of
bands, type of the source (remote sensing, models), data types,
etc....

The sources are remote sensing images: Sea Surface Temperature (SST), Chlorophylle-a (CHL), Sea Level Anomaly (SLA), Ekman Pumping (EKP), etc. We could also take numerical model outputs, anything as long as it is a grid coverage.

The format is anything supported by Image I/O. I use mostly PNG, RAW and ASCII formats.

The data types that work best are unsigned byte, unsigned short and float.

The number of bands can be any, as long as the image format support them. I use images with 1 band and images with 3 band.

Each image, when taken individually, is two-dimensional. Images are grouped in Series. "Sea Surface Temperature" is a Series, "Sea Level Anomaly" is an other Series, etc. A Series is usually a single directory on disk, but this is not mandatory. The Series is the third dimension. In current version, the third axis is time, but we could try to generalize that. This mean that each individual image in a Series represents a feature like Temperature at a specific time (actually a time range, usually a few days). The whole series cover a range of time of a few years.

One individual image is a few MB. A full Series can be a few GB.

From the programmer point of view, a Series is a big three-dimensional coverage. The Java code on seagis.sourceforge.net construct automatically a org.geotools.coverage.CoverageStack from a Series name. Nothing more needed on the user side. Image are loaded only when needed, cached as much as possible, conversions from "pixel" to "geophysics" values applied, interpolations performed as needed, etc.

No tiling at this time for individual images. We need to merge your work on image mosaic and image pyramid, so the two works are really complementary :). The Seagis work did not attempt to solve the "loading one big image" problem. It was targetting the "Those 6000 images represent dynamic phenomenons changing with time over the same geographic area" problem.

Seagis can restricts the image portion to load however, using ImageReadParam source area. The source are is automatically converted from geographic coordinates to pixel coordinates.

An OpenOffice plugin exists for that too :). User can enter the following formula in an OpenOffice cell:

     =EVALUATE("SST", -12.6, -8.5, 20/11/1998)

and get the sea surface temperature in °C interpolated at this specific geographic coordinate and date. Quite convenient for an oceanographer wanting to do some analysis in his favorite tool; most scientists work mostly with spreadsheet.

2>Since you are very much interested in observations (if I am right of
course) did you have a look at what the Sensor Web Enablemetn Group is
doin in OGC? If you had it, what is your impression?

I'm very interrested in observations, and tried to design seagis along the time of OGC "Observation and Measurements" discussion paper. At this time, it was only a discussion paper. Now it is in a more advanced stade, but I didn't had the time to update myself yet, nor to read the sensor web enablement work. I'm very interrested to that, but too many things to do right now :(.

3>Why the heck you did not publicize more this raster+db thing? :slight_smile:

It was initially quite specific to a particular work. I tried progressively to make it more general (it is now much more generic than it was 2 years ago), but more work would probably be good. I was not expecting a strong interrest, and I'm actually pretty glad to find someone who may find it useful :).

4>I am quite excited about reusing this work of yours somehow by
hopefully combining with the work I have been carrying over with
ImageIO, JAI and (J)GDAL. I think I am going to bother you a lot next
week trying to integrated all these approaches into one and possibly
putting all this behind some new and more stable interfaces, as we
planned during our quick talk at FOSS. Please send as much as you can
to the list, I do not mind if it is in french, we could try to
transalte part of it.

As said previously, I think that our works are very complementary :). We created a small copy of this database for experimental work on our server. I think that we can give you an access to the PostgreSQL database on port 5432 so you can take a look at it, and we can try to modify it together if needed? If this is of interest for you, we would need the IP number of your machine for opening access (Vincent, are you okay with that?).

     Martin

As said previously, I think that our works are very complementary :). We created a small copy of this database for experimental work on our server. I think that we can give you an access to the PostgreSQL database on port 5432 so you can take a look at it, and we can try to modify it together if needed? If this is of interest for you, we would need the IP number of your machine for opening access (Vincent, are you okay with that?).

Of course I am.