[Geoserver-devel] Show 45 Millions of data : HeatMap or Clustering

Hi everybody,

For my work, I have to be able to propose by wms a representation of 45
million point data (position).
For the moment, I use the Heatmap and Clustering representations, proposed
on different forums, and calculated using the WPS module. Beyond 16 million
features, geoserver no longer responds, or it takes a lot of time.
Do you ever encounter this problem? If so how then? Can I multi-thread
treatments?
Moreover, looking at the logs, I have the impression that geoserver requests
3 times the selection of the elements in database, why?
thank you for everything

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/Show-45-Millions-of-data-HeatMap-or-Clustering-tp5297803.html
Sent from the GeoServer - Dev mailing list archive at Nabble.com.

Hi,
I’m not the author of those processes so I can only provide limited help, but I’m wondering
do you really want to transfer that many points between PostGIS and GeoServer?
I would look for a way to do clustering directly in the database, and get out directly
the synthetic information instead.

Maybe using a SQL view, for example. This stackoverflow entry might be of help:
http://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis

Cheers
Andrea

···

On Tue, Nov 29, 2016 at 9:52 AM, jarjar <jeoffrey.jardin@anonymised.com> wrote:

Hi everybody,

For my work, I have to be able to propose by wms a representation of 45
million point data (position).
For the moment, I use the Heatmap and Clustering representations, proposed
on different forums, and calculated using the WPS module. Beyond 16 million
features, geoserver no longer responds, or it takes a lot of time.
Do you ever encounter this problem? If so how then? Can I multi-thread
treatments?
Moreover, looking at the logs, I have the impression that geoserver requests
3 times the selection of the elements in database, why?
thank you for everything


View this message in context: http://osgeo-org.1560.x6.nabble.com/Show-45-Millions-of-data-HeatMap-or-Clustering-tp5297803.html
Sent from the GeoServer - Dev mailing list archive at Nabble.com.



Geoserver-devel mailing list
Geoserver-devel@anonymised.com.366…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


Hi Jeoffrey,

We have tweaked the heatmap process for use with GeoMesa; the process can be changed to pass query hints which a GeoTools datastore can leverage to execute the query differently. As an example of setting the hints in the process, see [1]. Those query hints are picked up by the GeoMesa query planner, and the computation of the heatmap can be distributed to the Accumulo tablet servers. Instead of returning SimpleFeatures, a grid of inputs to the HeatmapSurface is computed by the DataStore.

Anyhow, I mention that as an example of tweaking a WPS process a little bit; there is some effort at the DataStore level as well… And I’m not sure what could be done in PostGIS; there may be something clever to do there.

As a back-of-the-envelope, for GeoMesa, I’ve typically seen a few hundred thousand records stream back to GeoServer per second. As a gut feeling, I’d say that the optimization for a heatmap speeds things up around 10-25%… Even with that, for a query returning 45 million records, it could take longer than 60 seconds for that many records to come back. The usual rendering timeout is set to 60 seconds, so requests could be dying there.

Cheers,

Jim

  1. https://github.com/locationtech/geomesa/blob/master/geomesa-process/src/main/scala/org/locationtech/geomesa/process/DensityProcess.scala#L115-L123
···

On 11/29/2016 09:35 AM, Andrea Aime wrote:

Hi,
I’m not the author of those processes so I can only provide limited help, but I’m wondering
do you really want to transfer that many points between PostGIS and GeoServer?
I would look for a way to do clustering directly in the database, and get out directly
the synthetic information instead.

Maybe using a SQL view, for example. This stackoverflow entry might be of help:
http://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis

Cheers
Andrea

On Tue, Nov 29, 2016 at 9:52 AM, jarjar <jeoffrey.jardin@anonymised.com> wrote:

Hi everybody,

For my work, I have to be able to propose by wms a representation of 45
million point data (position).
For the moment, I use the Heatmap and Clustering representations, proposed
on different forums, and calculated using the WPS module. Beyond 16 million
features, geoserver no longer responds, or it takes a lot of time.
Do you ever encounter this problem? If so how then? Can I multi-thread
treatments?
Moreover, looking at the logs, I have the impression that geoserver requests
3 times the selection of the elements in database, why?
thank you for everything


View this message in context: http://osgeo-org.1560.x6.nabble.com/Show-45-Millions-of-data-HeatMap-or-Clustering-tp5297803.html
Sent from the GeoServer - Dev mailing list archive at Nabble.com.



Geoserver-devel mailing list
Geoserver-devel@anonymised.comsourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


------------------------------------------------------------------------------

_______________________________________________
Geoserver-devel mailing list
[Geoserver-devel@lists.sourceforge.net](mailto:Geoserver-devel@anonymised.comsourceforge.net)
[https://lists.sourceforge.net/lists/listinfo/geoserver-devel](https://lists.sourceforge.net/lists/listinfo/geoserver-devel)

I'd go one step further than Andrea and ask: Have you considered pre-rendering the data into a raster heatmap representation or otherwise pre-processing into an aggregation of some sort using a standalone GIS/tool? 45 million points is a *lot* of data - hundreds of megabytes on the conservative side.
When datasets get that large, it's usually worthwhile optimising so you're not selecting/using the entire dataset per-query.

Even if the data changes on say, an hourly basis, you could probably automate re-generation using GRASS/GDAL or similar and use GeoServer to serve only the latest edition.

Cheers,
Jonathan

On 29/11/2016 08:52, jarjar wrote:

Hi everybody,

For my work, I have to be able to propose by wms a representation of 45
million point data (position).
For the moment, I use the Heatmap and Clustering representations, proposed
on different forums, and calculated using the WPS module. Beyond 16 million
features, geoserver no longer responds, or it takes a lot of time.
Do you ever encounter this problem? If so how then? Can I multi-thread
treatments?
Moreover, looking at the logs, I have the impression that geoserver requests
3 times the selection of the elements in database, why?
thank you for everything

--
View this message in context: http://osgeo-org.1560.x6.nabble.com/Show-45-Millions-of-data-HeatMap-or-Clustering-tp5297803.html
Sent from the GeoServer - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel