[Geoserver-users] Speeding up GeoWebCache seeding process

Hi,

I have a 100MB shapefile with LINESTRINGs covering Germany, which I added as a cached layer. I use GeoServer 2.6.2 with included GeoWebCache 1.6.1 to seed tiles for layers 0-17 for the predefined EPSG_900913 gridset. On a server with 8 cores this process takes about 21 days.

I use Oracle Java 7 and JAI is installed. I’ve set “Number of tasks to use” to 8 (number of cpu cores).

Is there anything I can do to speed up the seeding process?

Best regards,

Jens

On Thu, Mar 26, 2015 at 9:52 AM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

Hi,

I have a 100MB shapefile with LINESTRINGs covering Germany, which I added
as a cached layer. I use GeoServer 2.6.2 with included GeoWebCache 1.6.1 to
seed tiles for layers 0-17 for the predefined EPSG_900913 gridset. On a
server with 8 cores this process takes about 21 days.

I use Oracle Java 7 and JAI is installed. I’ve set “Number of tasks to
use” to 8 (number of cpu cores).

Given the source file is not big and you're rendering vector data, I'm
going to assume the bottleneck is in the rendering phase.
Set the number of tasks to 16, and use OpenJDK, its rendering abilites
scale up better, or keep your JDK, and install Marlin, which
is going to give you both better scalability and better speed at the same
time: https://github.com/bourgesl/marlin-renderer

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Set the number of tasks to 16, and use OpenJDK, its rendering abilites scale up better, or keep your JDK, and install Marlin, which is going to give you both better scalability and better speed at the same time: https://github.com/bourgesl/marlin-renderer

One question: Is basic use enough https://github.com/bourgesl/marlin-renderer/wiki/How-to-use#how-to-use-basic or do I also need https://github.com/bourgesl/marlin-renderer/wiki/How-to-use#getting-out-every-last-ounce-of-performance- ?

Regards,
Jens

On Thu, Mar 26, 2015 at 11:54 AM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

   > Set the number of tasks to 16, and use OpenJDK, its rendering
abilites scale up better, or keep your JDK, and install Marlin, which is
going to give you both better scalability and better speed at the same
time: https://github.com/bourgesl/marlin-renderer

One question: Is basic use enough
https://github.com/bourgesl/marlin-renderer/wiki/How-to-use#how-to-use-basic
or do I also need
https://github.com/bourgesl/marlin-renderer/wiki/How-to-use#getting-out-every-last-ounce-of-performance-
?

Basic usage is enough, from what I know those bits are going to provide a
benefit only in microbenchmarks, or very specific use cases.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Given the source file is not big and you're rendering vector data, I'm going to assume the bottleneck is in the rendering phase.
Set the number of tasks to 16, and use OpenJDK, its rendering abilites scale up better, or keep your JDK, and install Marlin, which
is going to give you both better scalability and better speed at the same time: https://github.com/bourgesl/marlin-renderer

After setting number of tasks to 16 (server has 8 cores), I have the impression that it rather takes longer now. I’ve tested with layer 13 now. “Time remaining” column is totally nuts now. Says 21min for one row, then 30 minutes later, it says 24 min. The “Tiles completed” increases but very, very slow, just ~400 tiles in 30min, that is for 16 tasks only 16*400tiles=6400 tiles in 30min. And its 60’000 tiles for layer 13 (Estimated # of tiles).

htop says that Load average is about 6.60. The 8 cores have a percentage between 60-90% most of the time (just by looking with it with the eyes).

Best regards,
Jens

Hi again,

I really do not understand why it takes weeks for a simple 100 MB Shapefile with just LineStrings in Germany to render. 99.99 % of all tiles are just empty/transparent PNGs anyways, rest is just some red lines… I googled and found http://gis.stackexchange.com/questions/20712/geoserver-wms-tile-rendering-is-too-slow

  1. Would it make a difference to add a spatial index to the Shapefile? (I did not check if it has one, I would assume that GeoServer just adds one if necessary – how can I check if this has been done?)

  2. Also, we use the GeoServer-CSS extension. But the style is as simple as this:

  • {

fill: #e31a1c;

fill-opacity: 0.27;

stroke: #e31a1c;

stroke-width: 1;

stroke-opacity: 1;

}

Anyway, can this be related to the long seeding times?

  1. Maybe some kind of metatiling can be used to speed things up?

For OSM rendering with mapnik for same extent and much more features, it took just a few days. So I really do not get it, why it takes weeks for a damn simply Shapefile.

Ideas welcome…

Jens

On Tue, Mar 31, 2015 at 3:19 PM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

Hi again,

I really do not understand why it takes weeks for a simple 100 MB
Shapefile with just LineStrings in Germany to render. 99.99 % of all tiles
are just empty/transparent PNGs anyways, rest is just some red lines… I
googled and found
http://gis.stackexchange.com/questions/20712/geoserver-wms-tile-rendering-is-too-slow

1. Would it make a difference to add a spatial index to the Shapefile? (I
did not check if it has one, I would assume that GeoServer just adds one if
necessary – how can I check if this has been done?)

No idea the index should be built automatically, and it would be a .qix
file (assuming the directory where the shapefile
is is writable).
Can you share the shapefile in question, would be interesting to try it out
locally.

2. Also, we use the GeoServer-CSS extension. But the style is as simple as
this:

* {

         fill: #e31a1c;

         fill-opacity: 0.27;

Why did you add a fill for linestrings? This forces GeoServer to paint them
as polygons (closing them to form a polygon, start to end point) and fill
them.

          stroke: #e31a1c;

         stroke-width: 1;

         stroke-opacity: 1;

}

Anyway, can this be related to the long seeding times?

3. Maybe some kind of metatiling can be used to speed things up?

For OSM rendering with mapnik for same extent and much more features, it
took just a few days. So I really do not get it, why it takes weeks for a
damn simply Shapefile.

It would be interesting to use the exact same shapefile and style in mapnik
and see how different it is (osm styles are setup to avoid painting too
much information at the same time, which
aslo helps a lot to speed up rendering)

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

On Tue, Mar 31, 2015 at 3:19 PM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

For OSM rendering with mapnik for same extent and much more features, it
took just a few days. So I really do not get it, why it takes weeks for a
damn simply Shapefile.

Hi Jens,
thanks a lot for sharing the shapefile. I believe we have wildly different
ideas of what makes up a "simple shapefile".
The 100MB file is actually made of just 5 polygons, looking like lines if
you look at them from a distance, covering the whole Germany area.

This means that in order to render each tile, if all goes well and we have
a single geometry to read,
we have to read 20MB worth of data from the file, generalize it on the fly
(millions of points), reproject it, and clip it to the rendering area.
But given how the data is distributed, the spatial index is mostly useless,
and we'll end up reading most of the
geometries anyways, for each single tile, including the empty ones (since
the search is done by bbox).
I'm not one bit surprised that takes a lot of time, by comparison the 300MB
roads shapefile that I was working
with yesterday only loads a tiny part of the data in most tiles.

I'd suggest you to take this giant geometries and clip them on a regular
grid, separating polygon and outlines
as two different shapefiles, just like OSM did for the world countries
layer, to avoid the very same problem
you're facing (geometries that are excessively large).
And then paint the polygon bits just with a fill, and the outline bits just
with a line.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi Andrea,

···

I’d suggest you to take this giant geometries and clip them on a regular grid, separating polygon and outlines as two different shapefiles, just like OSM did for the world countries layer, to avoid the very same problem you’re facing (geometries that are excessively large). And then paint the polygon bits just with a fill, and the outline bits just with a line.

Thanks a lot, I’ll try to do this. I hope it’s not a stupid question, but anybody knows what’s the best way of doing this? Seems like it is not so easy to clip a shapefile’s features to a grid using e.g. Quantum GIS… Google only has suggestion on how to clip raster data to a grid, but not vector data.

Best,

Jens

…and would it help to set the default meta tiling settings to 20x20 instead of 4x4 – or is this only relevant when GeoServer “asks” for tiles but not for seeding?

Best,

Jens

On Wed, Apr 1, 2015 at 11:49 AM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

…and would it help to set the default meta tiling settings to 20x20
instead of 4x4 – or is this only relevant when GeoServer “asks” for tiles
but not for seeding?

Yes, I believe that would help, if you don't split the data, because we
would end up making less reads. But it would still be really inefficient
compared to a version
where the data has been split.

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi,

···

On Wed, Apr 1, 2015 at 11:49 AM, Nachtigall, Jens (init) <Jens.Nachtigall@…5799…> wrote:

…and would it help to set the default meta tiling settings to 20x20 instead of 4x4 – or is this only relevant when GeoServer “asks” for tiles but not for seeding?

Yes, I believe that would help, if you don’t split the data, because we would end up making less reads. But it would still be really inefficient compared to a version

where the data has been split.

Increasing the meta tiling really helps a lot. Thanks! Can this be increased to even more than 20x20? The GUI is limited to 20x20 maximum, but maybe by editing some XML setttings files?

Unfortunately, splitting the polygons from the shapefile to a grid does not really work with Quantum GIS. I tried the whole day, but splitting up the outlines according to a grid does not work… (with polygons it’s ok).

I assume that having the shapefile by default in EPSG:900913 instead of EPSG:3044 would help too, right? Then in each rendering step at least the reprojection does not need to be done. Do you think that this would have a bigger effect or will it be negligible?

Best,

Jens

On Wed, Apr 1, 2015 at 4:03 PM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

Hi,

On Wed, Apr 1, 2015 at 11:49 AM, Nachtigall, Jens (init) <
Jens.Nachtigall@anonymised.com> wrote:

  …and would it help to set the default meta tiling settings to 20x20
instead of 4x4 – or is this only relevant when GeoServer “asks” for tiles
but not for seeding?

Yes, I believe that would help, if you don't split the data, because we
would end up making less reads. But it would still be really inefficient
compared to a version

where the data has been split.

Increasing the meta tiling really helps a lot. Thanks! Can this be
increased to even more than 20x20? The GUI is limited to 20x20 maximum, but
maybe by editing some XML setttings files?

I think that by editing the files you could have it as big as you want,
just be mindful of how much memory that will use (at 20x20 the drawing
surface is 100MB, by 8 threads, 800MB just of drawing surfaces, then we
have to keep the rest of GeoServer in the game).
I believe you can modify your metatile factor in the right file inside
gwc-layers (there is one per layer) and then... I guess either issue a
reload, or restart GeoServer.

Unfortunately, splitting the polygons from the shapefile to a grid does
not really work with Quantum GIS. I tried the whole day, but splitting up
the outlines according to a grid does not work… (with polygons it’s ok).

I assume that having the shapefile by default in EPSG:900913 instead of
EPSG:3044 would help too, right? Then in each rendering step at least the
reprojection does not need to be done. Do you think that this would have a
bigger effect or will it be negligible?

Hum...good question... we certainly generalize before reprojecting, but we
don't clip before reprojecting so... that _might_ have some effect, but
I would not expect it to be big

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/NWWaa2 for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------