[Geoserver-devel] Time dimension on Vector Layer is SLOW

Hi,

I’m just starting looking into some slow downs in geoserver when adding a time dimension to a vector layer (i.e. postgis or geogig). By this I mean editing the layer, going to the “Dimensions” Tab and “Enabling” Time or Elevation.

A. GetCapabilities gets slow. It does a table scan for each GetCapabilities request to determine the min/max/unique values of the Time attribute (depending on how you set things up on the Layer’s dimension’s tab).

B. GetMap gets slow. I tracked this down to it doing 2 table scans and an index scan.

a. It does scan of the data to determine the max time. This is likely because the default is to have the max date be the “default” in a query;

SELECT max(“datemod”) FROM “public”.“my_dataset”

b.It does a scan of the data to construct a FID-Time Index for the entire dataset (for each getmap request):

SELECT “gid”,“datemod” FROM “public”.“my_dataset”

c. It then does a (normal) index scan to get data required to draw;

SELECT “gid”,encode(ST_AsBinary(ST_Simplify(ST_Force2D(“geom”), 15.5, true)),‘base64’) as “geom” FROM “public”.“my_dataset” WHERE “geom” &&

I haven’t look into this in detail - but there’s some obvious ways to make this smarter (i.e. caching time values for a layer like we cache extents - but that might not be what someone wants).

Has anyone else looked into this? I’ve seen some earlier discussion, but nothing too concrete…

Thanks,
Dave

Checking WMS 1.1.1 getcaps can have a single value, a list, or a interval - I agree it would be good to store this (similar to bounds) rather than calculate as you describe.

···

On 23 January 2017 at 12:57, Dave Blasby <dblasby@anonymised.com> wrote:

Hi,

I’m just starting looking into some slow downs in geoserver when adding a time dimension to a vector layer (i.e. postgis or geogig). By this I mean editing the layer, going to the “Dimensions” Tab and “Enabling” Time or Elevation.

A. GetCapabilities gets slow. It does a table scan for each GetCapabilities request to determine the min/max/unique values of the Time attribute (depending on how you set things up on the Layer’s dimension’s tab).

B. GetMap gets slow. I tracked this down to it doing 2 table scans and an index scan.

a. It does scan of the data to determine the max time. This is likely because the default is to have the max date be the “default” in a query;

SELECT max(“datemod”) FROM “public”.“my_dataset”

b.It does a scan of the data to construct a FID-Time Index for the entire dataset (for each getmap request):

SELECT “gid”,“datemod” FROM “public”.“my_dataset”

c. It then does a (normal) index scan to get data required to draw;

SELECT “gid”,encode(ST_AsBinary(ST_Simplify(ST_Force2D(“geom”), 15.5, true)),‘base64’) as “geom” FROM “public”.“my_dataset” WHERE “geom” &&

I haven’t look into this in detail - but there’s some obvious ways to make this smarter (i.e. caching time values for a layer like we cache extents - but that might not be what someone wants).

Has anyone else looked into this? I’ve seen some earlier discussion, but nothing too concrete…

Thanks,
Dave


Check out the vibrant tech community on one of the world’s most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


Geoserver-devel mailing list
Geoserver-devel@anonymised.com.366…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel


Jody Garnett

Hi,

I did some further investigation and what I said, above, is a wee bit wrong.

Looks like my dataset didn’t have the datemod field populated, so geoserver was doing extra work.
If the max visitor (B.a, above) returns null (no result), it will do (B.b). In most cases “B.b” will NOT be done.
If it does find a max value, the B.c query will look like this;

SELECT “gid”,encode(ST_AsBinary(ST_Force2D(“geom”)),‘base64’) as “geom” FROM “public”.“my_dataset” WHERE (“datafield” = ‘2014-05-26’ AND “datafield” IS NOT NULL AND “geom” &&

(this is a point layer, so I expect this is why the simplify is not present)

For most non-database layers this will be a full scan of the data and a spatial/attribute query.

For the get capabilities (layer Time dimension set to LIST and default=max value) you will see two queries;

  1. unique visitor (Postgis: SELECT DISTINCT …) to get the LIST of values
  2. max visitor (Postgis: SELECT max …) to get the default value

For most datasets (i.e. non-database), this will be 2 full scans of the dataset.

Thanks,
Dave

···

On Mon, Jan 23, 2017 at 12:57 PM, Dave Blasby <dblasby@anonymised.com> wrote:

Hi,

I’m just starting looking into some slow downs in geoserver when adding a time dimension to a vector layer (i.e. postgis or geogig). By this I mean editing the layer, going to the “Dimensions” Tab and “Enabling” Time or Elevation.

A. GetCapabilities gets slow. It does a table scan for each GetCapabilities request to determine the min/max/unique values of the Time attribute (depending on how you set things up on the Layer’s dimension’s tab).

B. GetMap gets slow. I tracked this down to it doing 2 table scans and an index scan.

a. It does scan of the data to determine the max time. This is likely because the default is to have the max date be the “default” in a query;

SELECT max(“datemod”) FROM “public”.“my_dataset”

b.It does a scan of the data to construct a FID-Time Index for the entire dataset (for each getmap request):

SELECT “gid”,“datemod” FROM “public”.“my_dataset”

c. It then does a (normal) index scan to get data required to draw;

SELECT “gid”,encode(ST_AsBinary(ST_Simplify(ST_Force2D(“geom”), 15.5, true)),‘base64’) as “geom” FROM “public”.“my_dataset” WHERE “geom” &&

I haven’t look into this in detail - but there’s some obvious ways to make this smarter (i.e. caching time values for a layer like we cache extents - but that might not be what someone wants).

Has anyone else looked into this? I’ve seen some earlier discussion, but nothing too concrete…

Thanks,
Dave

On Tue, Jan 24, 2017 at 1:28 AM, Dave Blasby <dblasby@anonymised.com>
wrote:

For the get capabilities (layer Time dimension set to LIST and default=max
value) you will see two queries;
1. unique visitor (Postgis: SELECT DISTINCT ...) to get the LIST of values
2. max visitor (Postgis: SELECT max ...) to get the default value

For most datasets (i.e. non-database), this will be 2 full scans of the
dataset.

Hum... that's debatable imho, only dumb datasets are really forced into
full scans.
Shapefile certainly cannot optimize out the visitors, but other data stores
could if code was added to them.
E.g., SOLR and Mongo can both execute max visitor fast, but there is no
optimized implementation in the store.
What's annoying is that there is no way to know if a store can optimize
these operations.

That said, time based data requires a index to operate efficiently (all
request will hit a specific
time or time range) so... well... you definitely don't want to run it off a
shapefile or other
non indexed source. And if there is an index, both those operations should
be easy to optimize
(given developer time).

About static vs dynamic, both have their own use case, what we have today
in GeoServer
is the case that sponsored the original functionality (e.g. data mostly
static from the spatial
p.o.v. and changing quite aggressively on the temporal p.o.v.), full
dynamic bboxes
and cached/configured dimensions values both make sense and their support
could
be created given enough development resources.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Ciao Jody,
in real use cases storing this values must be done with care.

Most people tend to update tables directly and continuosly with
incoming data (moving objects, time series) hence caching the bounds
can lead to people not seeing freshest data.
I would probably not cache this info by default and also I would make
sure we can force a recompute via GUI/REST.

Regards,
Simone Giannecchini

GeoServer Professional Services from the experts!
Visit http://goo.gl/it488V for more information.

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------
AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate.
Il loro utilizzo è consentito esclusivamente al destinatario del
messaggio, per le finalità indicate nel messaggio stesso. Qualora
riceviate questo messaggio senza esserne il destinatario, Vi preghiamo
cortesemente di darcene notizia via e-mail e di procedere alla
distruzione del messaggio stesso, cancellandolo dal Vostro sistema.
Conservare il messaggio stesso, divulgarlo anche in parte,
distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità
diverse, costituisce comportamento contrario ai principi dettati dal
D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely
for the attention and use of the named addressee(s) and may be
confidential or proprietary in nature or covered by the provisions of
privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New
Data Protection Code).Any use not in accord with its purpose, any
disclosure, reproduction, copying, distribution, or either
dissemination, either whole or partial, is strictly forbidden except
previous formal approval of the named addressee(s). If you are not the
intended recipient, please contact immediately the sender by
telephone, fax or e-mail and delete the information in this message
that has been received in error. The sender does not give any warranty
or accept liability as the content, accuracy or completeness of sent
messages and accepts no responsibility for changes made after they
were sent or for other risks which arise as a result of e-mail
transmission, viruses, etc.

On Mon, Jan 23, 2017 at 10:43 PM, Jody Garnett <jody.garnett@anonymised.com> wrote:

Checking WMS 1.1.1 getcaps can have a single value, a list, or a interval -
I agree it would be good to store this (similar to bounds) rather than
calculate as you describe.

--
Jody Garnett

On 23 January 2017 at 12:57, Dave Blasby <dblasby@anonymised.com> wrote:

Hi,

I'm just starting looking into some slow downs in geoserver when adding a
time dimension to a vector layer (i.e. postgis or geogig). By this I mean
editing the layer, going to the "Dimensions" Tab and "Enabling" Time or
Elevation.

A. GetCapabilities gets slow. It does a table scan for each
GetCapabilities request to determine the min/max/unique values of the Time
attribute (depending on how you set things up on the Layer's dimension's
tab).

B. GetMap gets slow. I tracked this down to it doing 2 table scans and an
index scan.

a. It does scan of the data to determine the max time. This is likely
because the default is to have the max date be the "default" in a query;

SELECT max("datemod") FROM "public"."my_dataset"

b.It does a scan of the data to construct a FID-Time Index for the entire
dataset (for each getmap request):

SELECT "gid","datemod" FROM "public"."my_dataset"

c. It then does a (normal) index scan to get data required to draw;

SELECT "gid",encode(ST_AsBinary(ST_Simplify(ST_Force2D("geom"), 15.5,
true)),'base64') as "geom" FROM "public"."my_dataset" WHERE "geom" &&
<polygon>

I haven't look into this in detail - but there's some obvious ways to make
this smarter (i.e. caching time values for a layer like we cache extents -
but that might not be what someone wants).

Has anyone else looked into this? I've seen some earlier discussion, but
nothing too concrete...

Thanks,
Dave

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Since these calls are part of GetCapabilities, would it make sense to add some handling to the DataStore API / ContentDataStore abstract implementation (or worst case, the docs)? With a change like that, it might make it easier for developers (both implementing and reading/understanding) to see what is going on. The default abstract implementation could have some safe guards in place to prevent full-table scans. Thoughts? Jim

···

On 01/24/2017 02:27 AM, Andrea Aime wrote:

On Tue, Jan 24, 2017 at 1:28 AM, Dave Blasby <dblasby@anonymised.com> wrote:

For the get capabilities (layer Time dimension set to LIST and default=max value) you will see two queries;

  1. unique visitor (Postgis: SELECT DISTINCT …) to get the LIST of values
  2. max visitor (Postgis: SELECT max …) to get the default value

For most datasets (i.e. non-database), this will be 2 full scans of the dataset.

Hum… that’s debatable imho, only dumb datasets are really forced into full scans.
Shapefile certainly cannot optimize out the visitors, but other data stores
could if code was added to them.
E.g., SOLR and Mongo can both execute max visitor fast, but there is no optimized implementation in the store.
What’s annoying is that there is no way to know if a store can optimize these operations.

That said, time based data requires a index to operate efficiently (all request will hit a specific
time or time range) so… well… you definitely don’t want to run it off a shapefile or other
non indexed source. And if there is an index, both those operations should be easy to optimize
(given developer time).

About static vs dynamic, both have their own use case, what we have today in GeoServer
is the case that sponsored the original functionality (e.g. data mostly static from the spatial
p.o.v. and changing quite aggressively on the temporal p.o.v.), full dynamic bboxes
and cached/configured dimensions values both make sense and their support could
be created given enough development resources.

Cheers
Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! [http://sdm.link/slashdot](http://sdm.link/slashdot)
_______________________________________________
Geoserver-devel mailing list
[Geoserver-devel@lists.sourceforge.net](mailto:Geoserver-devel@anonymised.comsourceforge.net)
[https://lists.sourceforge.net/lists/listinfo/geoserver-devel](https://lists.sourceforge.net/lists/listinfo/geoserver-devel)

On Tue, Jan 24, 2017 at 6:20 PM, Jim Hughes <jnh5y@anonymised.com> wrote:

What's annoying is that there is no way to know if a store can optimize
these operations.

Since these calls are part of GetCapabilities, would it make sense to add
some handling to the DataStore API / ContentDataStore abstract
implementation (or worst case, the docs)?

With a change like that, it might make it easier for developers (both
implementing and reading/understanding) to see what is going on. The
default abstract implementation could have some safe guards in place to
prevent full-table scans.

Thoughts?

FeatureSource can return a QueryCapabilities class which is already used to
report about what a store can do
natively, it could be extended to add the list of visitors that it can
optimize out with efficient queries.
If you feel like doing it, I'd say go for it (starting a discussion on
gt-devel to ensure everybody is on board, I know I am).

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

We are coming up to the feature freeze Feb 18th.

Are you interested in putting together a proposal to caching some of this information (I am interested in GetCapabilities being faster); or do you want to handle it at the geogig / postgis datastore level?

···

On 23 January 2017 at 15:43, Jody Garnett <jody.garnett@anonymised.com> wrote:

Checking WMS 1.1.1 getcaps can have a single value, a list, or a interval - I agree it would be good to store this (similar to bounds) rather than calculate as you describe.


Jody Garnett


Jody Garnett

On 23 January 2017 at 12:57, Dave Blasby <dblasby@anonymised.com> wrote:

Hi,

I’m just starting looking into some slow downs in geoserver when adding a time dimension to a vector layer (i.e. postgis or geogig). By this I mean editing the layer, going to the “Dimensions” Tab and “Enabling” Time or Elevation.

A. GetCapabilities gets slow. It does a table scan for each GetCapabilities request to determine the min/max/unique values of the Time attribute (depending on how you set things up on the Layer’s dimension’s tab).

B. GetMap gets slow. I tracked this down to it doing 2 table scans and an index scan.

a. It does scan of the data to determine the max time. This is likely because the default is to have the max date be the “default” in a query;

SELECT max(“datemod”) FROM “public”.“my_dataset”

b.It does a scan of the data to construct a FID-Time Index for the entire dataset (for each getmap request):

SELECT “gid”,“datemod” FROM “public”.“my_dataset”

c. It then does a (normal) index scan to get data required to draw;

SELECT “gid”,encode(ST_AsBinary(ST_Simplify(ST_Force2D(“geom”), 15.5, true)),‘base64’) as “geom” FROM “public”.“my_dataset” WHERE “geom” &&

I haven’t look into this in detail - but there’s some obvious ways to make this smarter (i.e. caching time values for a layer like we cache extents - but that might not be what someone wants).

Has anyone else looked into this? I’ve seen some earlier discussion, but nothing too concrete…

Thanks,
Dave


Check out the vibrant tech community on one of the world’s most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


Geoserver-devel mailing list
Geoserver-devel@anonymised.comrge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel