[Geoserver-users] How to serve large spatio-temporal dataset with GeoServer?

Dear GeoServer users,

I have trouble figuring out fast and scalable way to serve lightning data via GeoServer.

My dataset consists of several million points spread over central Europe spanning several years. I have approximately 3M lightning strikes right now and it's just a fraction of what I'll have to handle ultimately. I'm using Oracle Locator database with both spatial and normal indexes and while it has a few quirks it works reasonably well when the amount of lightnings is small (i.e. in thousands or tens of thousands).

While my WMS client will never allow to show more than 2 hours worth of data (that's small amount of lightning strikes) there are particular WMS requests which take a very long time: when I want to see a "bigger picture" of all lightning strikes in central Europe during a specified short period of time.

The core of the problem is my data have both spatial and temporal dimension and there is no spatio-temporal index in Oracle Locator. So even if regular index on the time dimension can limit the number of features to a few thousands in the blink of an eye, the spatial index over the point geometry column won't help much (since the BBOX in the request covers the whole area anyway) and is in fact doing harm. The query found in GeoServer logs runs really fast If I omit the spatial index clause in such case (just a few hundred ms compared to 6-7 seconds for the full query with SDO_FILTER function call). Another bad thing is my colleagues predict that the performance will get worse with more data in the table once it won't fit into RAM and the database engine will have to use hard drives for processing.

The performance improves rapidly as I zoom to larger scales (smaller areas), where the spatial index selects just a small subset of data. However, I'd like to be able to serve the whole central Europe quickly, too.

One possible solution is to add the time dimension to my spatial index (so it's 3D instead of 2D), but I'm afraid GeoServer won't be able to retrieve data from such index (it won't be EPSG:3857 geometry anymore).

Another solution from an Oracle forum suggests using partitioning over time and have separate spatial index for each partition, but that requires expensive Oracle Enterprise license (which was not budgeted in the project of course) and it's just dividing the problem by a constant factor anyway.

So, since I'm out of my own ideas, how would you handle this situation? What other tools or formats are useful? Is the Postgres/PostGIS combo better at serving large-scale spatio-temporal datasets (with regard to GeoServer)?

Many thanks for any help!

--
Peter Kovac
IMS Programmer
MicroStep-MIS
peter.kovac@anonymised.com

Hi Peter,

Are you always querying for a short time period? If so, you might get the most mileage out of a SQL database (Oracle/Postgres) by creating an index on time and providing any vendor specific query hints to leverage that index. From a GeoTools/GeoServer perspective, WMS/WFS queries are turned into SQL queries; how the database handles those is a database admin challenge. With a time-based index and restricted queries, there might be a window where performance can be reasonable.

It may be worth considering a distributed database. If your architecture prevents that, then the options are tuning Oracle or PostGIS to the max and/or implementing application-based temporal sharding. The former can require expensive licenses (as you mentioned) or some fiddling. An example of temporal-sharding would be having a different table for each month or week. As you query across time, your application would know that it has to ask for different layers in GeoServer or tables in the database.

If you can use a distributed database, I'd note that there are a number of projects to provide geo extensions to popular options such as Accumulo, Cassandra, HBase, ElasticSearch, etc. A number of those projects include a GeoTools datastore implementation or a GeoServer plugin which makes them compatible with GeoServer. I am a GeoMesa committer and we've had great success using GeoServer to serve up feature data and aggregations like heatmaps over datasets scaling to billions of records. Admittedly, setting up a distributed database is non-trivial, but it may allow for more options when working with large datasets.

Cheers,

Jim

On 06/21/2016 10:22 AM, Peter Kovac wrote:

Dear GeoServer users,

I have trouble figuring out fast and scalable way to serve lightning
data via GeoServer.

My dataset consists of several million points spread over central Europe
spanning several years. I have approximately 3M lightning strikes right
now and it's just a fraction of what I'll have to handle ultimately. I'm
using Oracle Locator database with both spatial and normal indexes and
while it has a few quirks it works reasonably well when the amount of
lightnings is small (i.e. in thousands or tens of thousands).

While my WMS client will never allow to show more than 2 hours worth of
data (that's small amount of lightning strikes) there are particular WMS
requests which take a very long time: when I want to see a "bigger
picture" of all lightning strikes in central Europe during a specified
short period of time.

The core of the problem is my data have both spatial and temporal
dimension and there is no spatio-temporal index in Oracle Locator. So
even if regular index on the time dimension can limit the number of
features to a few thousands in the blink of an eye, the spatial index
over the point geometry column won't help much (since the BBOX in the
request covers the whole area anyway) and is in fact doing harm. The
query found in GeoServer logs runs really fast If I omit the spatial
index clause in such case (just a few hundred ms compared to 6-7 seconds
for the full query with SDO_FILTER function call). Another bad thing is
my colleagues predict that the performance will get worse with more data
in the table once it won't fit into RAM and the database engine will
have to use hard drives for processing.

The performance improves rapidly as I zoom to larger scales (smaller
areas), where the spatial index selects just a small subset of data.
However, I'd like to be able to serve the whole central Europe quickly, too.

One possible solution is to add the time dimension to my spatial index
(so it's 3D instead of 2D), but I'm afraid GeoServer won't be able to
retrieve data from such index (it won't be EPSG:3857 geometry anymore).

Another solution from an Oracle forum suggests using partitioning over
time and have separate spatial index for each partition, but that
requires expensive Oracle Enterprise license (which was not budgeted in
the project of course) and it's just dividing the problem by a constant
factor anyway.

So, since I'm out of my own ideas, how would you handle this situation?
What other tools or formats are useful? Is the Postgres/PostGIS combo
better at serving large-scale spatio-temporal datasets (with regard to
GeoServer)?

Many thanks for any help!

Hi Peter,
it has sometimes issues deciding which one to use and ends up using the wrong one.
You should try PostGis, with proper indexes and statistics setup, with “only” 3 million records

it should respond fast.

Worth investigating at the very least, let us know how it goes.

Cheers
Andrea

···

On Tue, Jun 21, 2016 at 4:22 PM, Peter Kovac <peter.kovac@anonymised.com> wrote:

Dear GeoServer users,

I have trouble figuring out fast and scalable way to serve lightning
data via GeoServer.

My dataset consists of several million points spread over central Europe
spanning several years. I have approximately 3M lightning strikes right
now and it’s just a fraction of what I’ll have to handle ultimately. I’m
using Oracle Locator database with both spatial and normal indexes and
while it has a few quirks it works reasonably well when the amount of
lightnings is small (i.e. in thousands or tens of thousands).

While my WMS client will never allow to show more than 2 hours worth of
data (that’s small amount of lightning strikes) there are particular WMS
requests which take a very long time: when I want to see a “bigger
picture” of all lightning strikes in central Europe during a specified
short period of time.

The core of the problem is my data have both spatial and temporal
dimension and there is no spatio-temporal index in Oracle Locator. So
even if regular index on the time dimension can limit the number of
features to a few thousands in the blink of an eye, the spatial index
over the point geometry column won’t help much (since the BBOX in the
request covers the whole area anyway) and is in fact doing harm. The
query found in GeoServer logs runs really fast If I omit the spatial
index clause in such case (just a few hundred ms compared to 6-7 seconds
for the full query with SDO_FILTER function call). Another bad thing is
my colleagues predict that the performance will get worse with more data
in the table once it won’t fit into RAM and the database engine will
have to use hard drives for processing.

The performance improves rapidly as I zoom to larger scales (smaller
areas), where the spatial index selects just a small subset of data.
However, I’d like to be able to serve the whole central Europe quickly, too.

One possible solution is to add the time dimension to my spatial index
(so it’s 3D instead of 2D), but I’m afraid GeoServer won’t be able to
retrieve data from such index (it won’t be EPSG:3857 geometry anymore).

Another solution from an Oracle forum suggests using partitioning over
time and have separate spatial index for each partition, but that
requires expensive Oracle Enterprise license (which was not budgeted in
the project of course) and it’s just dividing the problem by a constant
factor anyway.

So, since I’m out of my own ideas, how would you handle this situation?
What other tools or formats are useful? Is the Postgres/PostGIS combo
better at serving large-scale spatio-temporal datasets (with regard to
GeoServer)?

Many thanks for any help!


Peter Kovac
IMS Programmer
MicroStep-MIS
peter.kovac@anonymised.com


Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape


Geoserver-users mailing list
Geoserver-users@anonymised.comsts.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Hi Andrea,

what do you mean with “deciding which one to use”? AFAIK it has to use both, (almost) all the time.

This is the query found in GeoServer logs (LOCATION is the point geometry column with a spatial index):

SELECT ID,TIME,LOCATION
FROM lightning_strikes
WHERE (
TIME BETWEEN ? AND ?
AND SDO_FILTER(LOCATION, ?, ‘mask=anyinteract querytype=WINDOW’) = ‘TRUE’)
)

Let’s say I want just 1 hour worth of lightning strikes but from almost all of the central Europe (so the BBOX is not covering the whole dataset but covers 90% of it).
The ‘TIME BETWEEN’ clause quickly filters 3M rows to let’s say 10K lightnings. However, some of them lie outside of the BBOX. So it has to filter them out. Unfortunately, there are millions of other records in the same BBOX, so the spatial index finds them all and then the database engine has to make intersection of the two sets. Since one of them is huge, it takes time. And it will get worse over time.

OK, maybe I see your point now. The DB engine could just check each row from the smaller set if it is inside the BBOX, without using the spatial index. I wonder if PostGIS can do it. I’ll give it a try.

Thank you

···

On 21. 6. 2016 17:47, Andrea Aime wrote:

Hi Peter,
from my experience Oracle is probably the cause, given a spatial and a regular index
it has sometimes issues deciding which one to use and ends up using the wrong one.
You should try PostGis, with proper indexes and statistics setup, with “only” 3 million records

it should respond fast.

Worth investigating at the very least, let us know how it goes.

Cheers
Andrea

On Tue, Jun 21, 2016 at 4:22 PM, Peter Kovac <peter.kovac@anonymised.com> wrote:

Dear GeoServer users,

I have trouble figuring out fast and scalable way to serve lightning
data via GeoServer.

My dataset consists of several million points spread over central Europe
spanning several years. I have approximately 3M lightning strikes right
now and it’s just a fraction of what I’ll have to handle ultimately. I’m
using Oracle Locator database with both spatial and normal indexes and
while it has a few quirks it works reasonably well when the amount of
lightnings is small (i.e. in thousands or tens of thousands).

While my WMS client will never allow to show more than 2 hours worth of
data (that’s small amount of lightning strikes) there are particular WMS
requests which take a very long time: when I want to see a “bigger
picture” of all lightning strikes in central Europe during a specified
short period of time.

The core of the problem is my data have both spatial and temporal
dimension and there is no spatio-temporal index in Oracle Locator. So
even if regular index on the time dimension can limit the number of
features to a few thousands in the blink of an eye, the spatial index
over the point geometry column won’t help much (since the BBOX in the
request covers the whole area anyway) and is in fact doing harm. The
query found in GeoServer logs runs really fast If I omit the spatial
index clause in such case (just a few hundred ms compared to 6-7 seconds
for the full query with SDO_FILTER function call). Another bad thing is
my colleagues predict that the performance will get worse with more data
in the table once it won’t fit into RAM and the database engine will
have to use hard drives for processing.

The performance improves rapidly as I zoom to larger scales (smaller
areas), where the spatial index selects just a small subset of data.
However, I’d like to be able to serve the whole central Europe quickly, too.

One possible solution is to add the time dimension to my spatial index
(so it’s 3D instead of 2D), but I’m afraid GeoServer won’t be able to
retrieve data from such index (it won’t be EPSG:3857 geometry anymore).

Another solution from an Oracle forum suggests using partitioning over
time and have separate spatial index for each partition, but that
requires expensive Oracle Enterprise license (which was not budgeted in
the project of course) and it’s just dividing the problem by a constant
factor anyway.

So, since I’m out of my own ideas, how would you handle this situation?
What other tools or formats are useful? Is the Postgres/PostGIS combo
better at serving large-scale spatio-temporal datasets (with regard to
GeoServer)?

Many thanks for any help!


Peter Kovac
IMS Programmer
MicroStep-MIS
peter.kovac@anonymised.com


Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape


Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


-- 
Peter Kovac
IMS Programmer
MicroStep-MIS
[peter.kovac@anonymised.com](mailto:peter.kovac@anonymised.com)

Hi Jim,

unfortunately I have zero experience with distributed databases so the fiddling with temporal sharding is probably what I'll end up doing.

However, GeoMesa looks interesting, so I'll probably do my homework and learn something about it.

Thank you

On 21. 6. 2016 17:17, Jim Hughes wrote:

Hi Peter,

Are you always querying for a short time period? If so, you might get
the most mileage out of a SQL database (Oracle/Postgres) by creating an
index on time and providing any vendor specific query hints to leverage
that index. From a GeoTools/GeoServer perspective, WMS/WFS queries are
turned into SQL queries; how the database handles those is a database
admin challenge. With a time-based index and restricted queries, there
might be a window where performance can be reasonable.

It may be worth considering a distributed database. If your
architecture prevents that, then the options are tuning Oracle or
PostGIS to the max and/or implementing application-based temporal
sharding. The former can require expensive licenses (as you mentioned)
or some fiddling. An example of temporal-sharding would be having a
different table for each month or week. As you query across time, your
application would know that it has to ask for different layers in
GeoServer or tables in the database.

If you can use a distributed database, I'd note that there are a number
of projects to provide geo extensions to popular options such as
Accumulo, Cassandra, HBase, ElasticSearch, etc. A number of those
projects include a GeoTools datastore implementation or a GeoServer
plugin which makes them compatible with GeoServer. I am a GeoMesa
committer and we've had great success using GeoServer to serve up
feature data and aggregations like heatmaps over datasets scaling to
billions of records. Admittedly, setting up a distributed database is
non-trivial, but it may allow for more options when working with large
datasets.

Cheers,

Jim

On 06/21/2016 10:22 AM, Peter Kovac wrote:

Dear GeoServer users,

I have trouble figuring out fast and scalable way to serve lightning
data via GeoServer.

My dataset consists of several million points spread over central Europe
spanning several years. I have approximately 3M lightning strikes right
now and it's just a fraction of what I'll have to handle ultimately. I'm
using Oracle Locator database with both spatial and normal indexes and
while it has a few quirks it works reasonably well when the amount of
lightnings is small (i.e. in thousands or tens of thousands).

While my WMS client will never allow to show more than 2 hours worth of
data (that's small amount of lightning strikes) there are particular WMS
requests which take a very long time: when I want to see a "bigger
picture" of all lightning strikes in central Europe during a specified
short period of time.

The core of the problem is my data have both spatial and temporal
dimension and there is no spatio-temporal index in Oracle Locator. So
even if regular index on the time dimension can limit the number of
features to a few thousands in the blink of an eye, the spatial index
over the point geometry column won't help much (since the BBOX in the
request covers the whole area anyway) and is in fact doing harm. The
query found in GeoServer logs runs really fast If I omit the spatial
index clause in such case (just a few hundred ms compared to 6-7 seconds
for the full query with SDO_FILTER function call). Another bad thing is
my colleagues predict that the performance will get worse with more data
in the table once it won't fit into RAM and the database engine will
have to use hard drives for processing.

The performance improves rapidly as I zoom to larger scales (smaller
areas), where the spatial index selects just a small subset of data.
However, I'd like to be able to serve the whole central Europe quickly, too.

One possible solution is to add the time dimension to my spatial index
(so it's 3D instead of 2D), but I'm afraid GeoServer won't be able to
retrieve data from such index (it won't be EPSG:3857 geometry anymore).

Another solution from an Oracle forum suggests using partitioning over
time and have separate spatial index for each partition, but that
requires expensive Oracle Enterprise license (which was not budgeted in
the project of course) and it's just dividing the problem by a constant
factor anyway.

So, since I'm out of my own ideas, how would you handle this situation?
What other tools or formats are useful? Is the Postgres/PostGIS combo
better at serving large-scale spatio-temporal datasets (with regard to
GeoServer)?

Many thanks for any help!

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

--
Peter Kovac
IMS Programmer
MicroStep-MIS
peter.kovac@anonymised.com

Hi,

I think the point is, which index to use first J

My two cents, by the way: If you can narrow your use cases down from ”any time interval, in any area”, you might consider pre-processing (aggregating) the data in e.g. 1-hour, 1-day, 1-week, etc. chunks, or similarly, geographic chunks of, say 100km x 100km, similar to the idea behind tiled maps.

But before you do that: Try PostGIS.

/julian

···

Fra: Peter Kovac [mailto:peter.kovac@…7677…]
Sendt: 22. juni 2016 10:02
Til: Andrea Aime <andrea.aime@…1107…>
Cc: GeoServer Mailing List List geoserver-users@lists.sourceforge.net
Emne: Re: [Geoserver-users] How to serve large spatio-temporal dataset with GeoServer?

Hi Andrea,

what do you mean with “deciding which one to use”? AFAIK it has to use both, (almost) all the time.

This is the query found in GeoServer logs (LOCATION is the point geometry column with a spatial index):

SELECT ID,TIME,LOCATION
FROM lightning_strikes
WHERE (
TIME BETWEEN ? AND ?
AND SDO_FILTER(LOCATION, ?, ‘mask=anyinteract querytype=WINDOW’) = ‘TRUE’)
)

Let’s say I want just 1 hour worth of lightning strikes but from almost all of the central Europe (so the BBOX is not covering the whole dataset but covers 90% of it).
The ‘TIME BETWEEN’ clause quickly filters 3M rows to let’s say 10K lightnings. However, some of them lie outside of the BBOX. So it has to filter them out. Unfortunately, there are millions of other records in the same BBOX, so the spatial index finds them all and then the database engine has to make intersection of the two sets. Since one of them is huge, it takes time. And it will get worse over time.

OK, maybe I see your point now. The DB engine could just check each row from the smaller set if it is inside the BBOX, without using the spatial index. I wonder if PostGIS can do it. I’ll give it a try.

Thank you

On 21. 6. 2016 17:47, Andrea Aime wrote:

Hi Peter,

from my experience Oracle is probably the cause, given a spatial and a regular index

it has sometimes issues deciding which one to use and ends up using the wrong one.

You should try PostGis, with proper indexes and statistics setup, with “only” 3 million records

it should respond fast.

Worth investigating at the very least, let us know how it goes.

Cheers

Andrea

On Tue, Jun 21, 2016 at 4:22 PM, Peter Kovac <peter.kovac@…7677…> wrote:

Dear GeoServer users,

I have trouble figuring out fast and scalable way to serve lightning
data via GeoServer.

My dataset consists of several million points spread over central Europe
spanning several years. I have approximately 3M lightning strikes right
now and it’s just a fraction of what I’ll have to handle ultimately. I’m
using Oracle Locator database with both spatial and normal indexes and
while it has a few quirks it works reasonably well when the amount of
lightnings is small (i.e. in thousands or tens of thousands).

While my WMS client will never allow to show more than 2 hours worth of
data (that’s small amount of lightning strikes) there are particular WMS
requests which take a very long time: when I want to see a “bigger
picture” of all lightning strikes in central Europe during a specified
short period of time.

The core of the problem is my data have both spatial and temporal
dimension and there is no spatio-temporal index in Oracle Locator. So
even if regular index on the time dimension can limit the number of
features to a few thousands in the blink of an eye, the spatial index
over the point geometry column won’t help much (since the BBOX in the
request covers the whole area anyway) and is in fact doing harm. The
query found in GeoServer logs runs really fast If I omit the spatial
index clause in such case (just a few hundred ms compared to 6-7 seconds
for the full query with SDO_FILTER function call). Another bad thing is
my colleagues predict that the performance will get worse with more data
in the table once it won’t fit into RAM and the database engine will
have to use hard drives for processing.

The performance improves rapidly as I zoom to larger scales (smaller
areas), where the spatial index selects just a small subset of data.
However, I’d like to be able to serve the whole central Europe quickly, too.

One possible solution is to add the time dimension to my spatial index
(so it’s 3D instead of 2D), but I’m afraid GeoServer won’t be able to
retrieve data from such index (it won’t be EPSG:3857 geometry anymore).

Another solution from an Oracle forum suggests using partitioning over
time and have separate spatial index for each partition, but that
requires expensive Oracle Enterprise license (which was not budgeted in
the project of course) and it’s just dividing the problem by a constant
factor anyway.

So, since I’m out of my own ideas, how would you handle this situation?
What other tools or formats are useful? Is the Postgres/PostGIS combo
better at serving large-scale spatio-temporal datasets (with regard to
GeoServer)?

Many thanks for any help!


Peter Kovac
IMS Programmer
MicroStep-MIS
peter.kovac@…7677…


Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape


Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

==

GeoServer Professional Services from the experts! Visit

http://goo.gl/it488V for more information.

==

Ing. Andrea Aime

@geowolf

Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)

phone: +39 0584 962313

fax: +39 0584 1660272

mob: +39 339 8844549

http://www.geo-solutions.it

http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


-- 
Peter Kovac
IMS Programmer
MicroStep-MIS
[peter.kovac@...7677...](mailto:peter.kovac@...7677...)

On Wed, Jun 22, 2016 at 10:02 AM, Peter Kovac <peter.kovac@anonymised.com

wrote:

Hi Andrea,

what do you mean with "deciding which one to use"? AFAIK it has to use
both, (almost) all the time.

According to what Bruce Momjan told me a few years ago, not always. He was
going over some statistics
on index usage, and when it's actually more efficient to use an index than
to do a linear scan, and the
threshold was less than 10% of the data (can't remember the actual value),
in other terms, index access is faster
than a linear scan in postgresql only if the index selects less than 10% of
the overall data.
Based on that, the planner will decide to use no index, only one, or both,
based on their selectivity.

In your case it seems you're constraining much more on time than on space,
if that's the case there is a chance
only the time index will be used, because using the other one as well might
result in slower execution.

This is the query found in GeoServer logs (LOCATION is the point geometry
column with a spatial index):

SELECT ID,TIME,LOCATION
FROM lightning_strikes
WHERE (
  TIME BETWEEN ? AND ?
  AND SDO_FILTER(LOCATION, ?, 'mask=anyinteract querytype=WINDOW') =
'TRUE')
)
Let's say I want just 1 hour worth of lightning strikes but from almost
all of the central Europe (so the BBOX is not covering the whole dataset
but covers 90% of it).
The 'TIME BETWEEN' clause quickly filters 3M rows to let's say 10K
lightnings. However, some of them lie outside of the BBOX. So it has to
filter them out. Unfortunately, there are millions of other records in the
same BBOX, so the spatial index finds them all and then the database engine
has to make intersection of the two sets. Since one of them is huge, it
takes time. And it will get worse over time.

Yes!

OK, maybe I see your point now. The DB engine could just check each row
from the smaller set if it is inside the BBOX, without using the spatial
index. I wonder if PostGIS can do it. I'll give it a try.

In my experience it does. Let us know how it goes.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi Peter,
To build on the other answers you’ve received - you need to create indexes that the database will use. This applies to all relational databases - Oracle/SQL Server/PostGres/Sqlite, etc.

When you send a query to a database there’s a query planner which takes your query and tries to determine the optimal way to execute it. It looks at what indexes you have and tries to decide which, if any, will get it the answer the fastest. Sometimes it will decide an index won’t help.
Just having an index on a particular column (or set of columns) doesn’t mean the database will use it.
Some reading around the topic:
http://docs.oracle.com/cd/B19306_01/server.102/b14211/optimops.htm
http://docs.oracle.com/cd/B19306_01/server.102/b14211/ex_plan.htm

(Note: Different databases have different planners and work differently - that’s all Oracle specific).

I’d suggest taking the query that is slow for you, and trying to optimise the database around it by using the EXPLAIN function and seeing where it’s spending all of its time.

While Andrea has an (understandable) leaning towards PostGres, I’d be surprised if you can’t optimise Oracle to get the desired result too if you’re fixed with that (but if you’re not - PostGres would probably be a worthwhile change).

Cheers,
Jonathan

---- On Wed, 22 Jun 2016 09:33:57 +0100 Andrea Aimeandrea.aime@anonymised.com wrote ----

On Wed, Jun 22, 2016 at 10:02 AM, Peter Kovac <peter.kovac@anonymised.com> wrote:

Hi Andrea,

what do you mean with “deciding which one to use”? AFAIK it has to use both, (almost) all the time.

According to what Bruce Momjan told me a few years ago, not always. He was going over some statistics
on index usage, and when it’s actually more efficient to use an index than to do a linear scan, and the
threshold was less than 10% of the data (can’t remember the actual value), in other terms, index access is faster
than a linear scan in postgresql only if the index selects less than 10% of the overall data.
Based on that, the planner will decide to use no index, only one, or both, based on their selectivity.

In your case it seems you’re constraining much more on time than on space, if that’s the case there is a chance
only the time index will be used, because using the other one as well might result in slower execution.

This is the query found in GeoServer logs (LOCATION is the point geometry column with a spatial index):

SELECT ID,TIME,LOCATION
FROM lightning_strikes
WHERE (
TIME BETWEEN ? AND ?
AND SDO_FILTER(LOCATION, ?, ‘mask=anyinteract querytype=WINDOW’) = ‘TRUE’)
)

Let’s say I want just 1 hour worth of lightning strikes but from almost all of the central Europe (so the BBOX is not covering the whole dataset but covers 90% of it).
The ‘TIME BETWEEN’ clause quickly filters 3M rows to let’s say 10K lightnings. However, some of them lie outside of the BBOX. So it has to filter them out. Unfortunately, there are millions of other records in the same BBOX, so the spatial index finds them all and then the database engine has to make intersection of the two sets. Since one of them is huge, it takes time. And it will get worse over time.

Yes!

OK, maybe I see your point now. The DB engine could just check each row from the smaller set if it is inside the BBOX, without using the spatial index. I wonder if PostGIS can do it. I’ll give it a try.

In my experience it does. Let us know how it goes.

Cheers
Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.



Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape_______________________________________________
Geoserver-users mailing list
Geoserver-users@anonymised.comforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

On Wed, Jun 22, 2016 at 2:15 PM, Jonathan Moules <
jonathan-lists@anonymised.com> wrote:

While Andrea has an (understandable) leaning towards PostGres, I'd be
surprised if you can't optimise Oracle to get the desired result too if
you're fixed with that (but if you're not - PostGres would probably be a
worthwhile change).

The leaning has some explanation, that goes beyond the natural sympathy for
another open source project.
GeoServer is translating every OGC request into the best query it can
setup, but the translation
is still, after all, automatic, with little or no control on the admin side
(sql views might help to some extent).

In Oracle developers decided to add query hints, every time a query
misbehaves you can add one of those to force the better
execution path... that's nice, as long as you can control how the queries
are written.... but that's not the case in GeoServer.
Of course that reduced pressure to improve the query planner (and made the
Oracle consultant market a bigger and more
profitable one).

In PostgreSql developers instead refused to add query hints support, and
treated every case in which the optimizer took
the wrong path as a bug. At the beginning that was pretty painful, but over
time the planner evolved to the point that it's actually
very good... this couples well with an automatic query generator, as long
as the query is valid the db should not need
any help using the best access path.

If you think about it setting up a WFS (or a WMS with CQL_FILTER) is really
like saying "hey, here is my database, hit it with whatever you want", since
people can literally write the filter they feel like, with no limitation on
complexity.
The is no query hint salvation there, each query is dynamic and defined by
the user at the time the request is made...
either the database is smart and fast on its own, without query specific
help from a human, or you're in for a pile of troubles.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi Andrea,
I wasn’t trying to downplay your penchant to push PostGres, it’s something I understand and agree with. Your explanation of the differences was very informative - I’ve seen posts mentioning them in passing before, but this was a little more in depth, thanks for sharing.

Cheers,
Jonathan

---- On Wed, 22 Jun 2016 13:31:39 +0100 Andrea Aimeandrea.aime@anonymised.com wrote ----

On Wed, Jun 22, 2016 at 2:15 PM, Jonathan Moules <jonathan-lists@anonymised.com> wrote:

While Andrea has an (understandable) leaning towards PostGres, I’d be surprised if you can’t optimise Oracle to get the desired result too if you’re fixed with that (but if you’re not - PostGres would probably be a worthwhile change).

The leaning has some explanation, that goes beyond the natural sympathy for another open source project.
GeoServer is translating every OGC request into the best query it can setup, but the translation
is still, after all, automatic, with little or no control on the admin side (sql views might help to some extent).

In Oracle developers decided to add query hints, every time a query misbehaves you can add one of those to force the better
execution path… that’s nice, as long as you can control how the queries are written… but that’s not the case in GeoServer.
Of course that reduced pressure to improve the query planner (and made the Oracle consultant market a bigger and more
profitable one).

In PostgreSql developers instead refused to add query hints support, and treated every case in which the optimizer took
the wrong path as a bug. At the beginning that was pretty painful, but over time the planner evolved to the point that it’s actually
very good… this couples well with an automatic query generator, as long as the query is valid the db should not need
any help using the best access path.

If you think about it setting up a WFS (or a WMS with CQL_FILTER) is really like saying “hey, here is my database, hit it with whatever you want”, since
people can literally write the filter they feel like, with no limitation on complexity.
The is no query hint salvation there, each query is dynamic and defined by the user at the time the request is made…
either the database is smart and fast on its own, without query specific help from a human, or you’re in for a pile of troubles.

Cheers
Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Dear Andrea,

thank you for this deep insight into why Postgres/PostGIS with GeoServer rocks! You were totally right!

I just made an exact clone of the lightning table in PostGIS and there is no observable difference in performance whether I’m looking at whole Europe (just a couple hundred ms) or just a tiny village unfortunate enough to have a major thunderstorm. It’s amazing.

BTW, if there’s anybody wondering how to migrate table from Oracle Locator/Spatial to PostGIS, here’s a link describing usage of another excellent open source software ogr2ogr: http://words.mixedbredie.net/archives/1751

Thank you again!

···

On 22. 6. 2016 14:31, Andrea Aime wrote:

On Wed, Jun 22, 2016 at 2:15 PM, Jonathan Moules <jonathan-lists@anonymised.com> wrote:

While Andrea has an (understandable) leaning towards PostGres, I’d be surprised if you can’t optimise Oracle to get the desired result too if you’re fixed with that (but if you’re not - PostGres would probably be a worthwhile change).

The leaning has some explanation, that goes beyond the natural sympathy for another open source project.
GeoServer is translating every OGC request into the best query it can setup, but the translation
is still, after all, automatic, with little or no control on the admin side (sql views might help to some extent).

In Oracle developers decided to add query hints, every time a query misbehaves you can add one of those to force the better
execution path… that’s nice, as long as you can control how the queries are written… but that’s not the case in GeoServer.
Of course that reduced pressure to improve the query planner (and made the Oracle consultant market a bigger and more
profitable one).

In PostgreSql developers instead refused to add query hints support, and treated every case in which the optimizer took
the wrong path as a bug. At the beginning that was pretty painful, but over time the planner evolved to the point that it’s actually
very good… this couples well with an automatic query generator, as long as the query is valid the db should not need
any help using the best access path.

If you think about it setting up a WFS (or a WMS with CQL_FILTER) is really like saying “hey, here is my database, hit it with whatever you want”, since
people can literally write the filter they feel like, with no limitation on complexity.
The is no query hint salvation there, each query is dynamic and defined by the user at the time the request is made…
either the database is smart and fast on its own, without query specific help from a human, or you’re in for a pile of troubles.

Cheers
Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


-- 
Peter Kovac
IMS Programmer
MicroStep-MIS
[peter.kovac@anonymised.com](mailto:peter.kovac@anonymised.com)