[Geoserver-devel] Build times out of control again...

Hi,
I’m wondering, am I the only one that’s bothered by our current long build
times?

Around a year ago I made some effort to bring the build speed times
down to a saner level, shaving off 5-6 minutes, it seems during the
last year the build times have increased significantly again, bringing
the build time on my machine above the 20 minutes mark (for a “-T4 -Prelease”
build, a non parallel one is really close to the half an hour mark).
Now, sure, there are new modules in the build, but their extra time would
account for a minute, a minute and a half slowdown…

So I set out to check what’s going on, and found that git-commit-id
was sucking out a significant amount of build time, and a way to
reduce that:
https://github.com/geoserver/geoserver/pull/1542

This cuts the build time down to 19:30, not too bad, still quite far away.
I then run again the build from a year ago, building gt and gwc in the process,
and discovered that it takes a few minutes longer (16)… so something
apparently slowed down on my computer, or the software running on it.
Quite weird indeed, I have no explanation for this one (a bunch of
OS upgrades in the meantime, not much else…)

And then I’ve looked around on the internet and found this blog,
suggesting to use a JVM tuning know to avoid much of the
JIT compiling effort for short lived JVMs:
http://zeroturnaround.com/rebellabs/your-maven-build-is-slow-speed-it-up/

I was a bit skeptical but I tried it anyways, in a separate branch, and to my surprise
build time went down from 20:43 to 14:34… whoa…
Pull request here:
https://github.com/geoserver/geoserver/pull/1543

Combining the two I am down to around 12:40 for a -T4 build (tried
a -T1C, aka -T8, does not help significantly).
The lookout of the parallel build from a CPU usage p.o.v is that the
build is not really parallelized up until the main module build is complete,
and then goes up to full tilt for a long while, and finally has to wait with little
cpu usage for the security-jdbc module and app-schema integration modules,
that are keeping the build going for a good minute longer.

If you’re curious about detailed timings I shared the google-doc used
to investigate this:
https://docs.google.com/spreadsheets/d/1r2XrwGwCq0TTcAph-Kbe2SY1uy3X7-v9v8Ss5s4PPxU/edit#gid=0

The doc above contains also non parallel build timings, useful to check
how long the builds are for a single module, with some highlight of some
“bad” modules after the optimizations have been applied:

  • main, which is bad news since the real build parallelization starts only after this module is done building. The bad news is that like 30s of that build time are sucked by 3 security related tests, mostly waiting for timeouts… anyone want to give it a shot and see if we can do better there?
  • wms, which is running a lot of tests for sure, but hmmm… a bit fishy, will check it
  • security ui core module, which I’m afraid is in this sorry state because it cannot reuse data dirs across tests… still, having an eye at it would be really good
  • application schema integration tests, no clue, there are quite a bit of tests for sure, but the overall time seems too much
    Honorable mention for the necdf-out extension module, that sucks out almost as much time as a full service implementation, with a build time towering well above all of the other output format extensions… might be worth having a look at too.

Anyways, reviews of the pull requests and feedback are welcomed (and of course, more build speedup work, too!)

Cheers
Andrea

···

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


On Mon, Mar 28, 2016 at 11:43 AM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

This cuts the build time down to 19:30, not too bad, still quite far away.
I then run again the build from a year ago, building gt and gwc in the
process,
and discovered that it takes a few minutes longer (16)... so something
apparently slowed down on my computer, or the software running on it.
Quite weird indeed, I have no explanation for this one (a bunch of
OS upgrades in the meantime, not much else...)

Clarification here, one year ago I reported 13 minutes build time for a -T4
-Prelease,
after the optimizations were done. And now the same build is apparently
taking 16...

Cheers
Andrea

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Application schema parsing and construction of FeatureType objects is very expensive. I am delighted that you can get the build of this module down to 82 seconds.

On 28/03/16 22:43, Andrea Aime wrote:

-*application schema integration tests*, no clue, there are quite a bit
of tests for sure, but the overall time seems too much

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <http://transient.nz/&gt;
New Zealand

On Mon, Mar 28, 2016 at 11:43 AM, Andrea Aime <andrea.aime@anonymised.com>
wrote:

The doc above contains also non parallel build timings, useful to check
how long the builds are for a single module, with some highlight of some
"bad" modules after the optimizations have been applied:

   - *main*, which is bad news since the real build parallelization
   starts only after this module is done building. The bad news is that like
   30s of that build time are sucked by 3 security related tests, mostly
   waiting for timeouts... anyone want to give it a shot and see if we can do
   better there?
   - *wms, *which is running a lot of tests for sure, but hmmm... a bit
   fishy, will check it
   - *security ui core module*, which I'm afraid is in this sorry state
   because it cannot reuse data dirs across tests... still, having an eye at
   it would be really good
   - *application schema integration tests*, no clue, there are quite a
   bit of tests for sure, but the overall time seems too much

Honorable mention for the *necdf-out* extension module, that sucks out
almost as much time as a full service implementation, with a build time
towering well above all of the other output format extensions... might be
worth having a look at too.

Some details on the above from a profile session.

The app schema integration tests suffers from schema parses, they are by
far the biggest time sucker, in most part due to the complex store
initialization (not visible in the screenshot below, it was nested too
deep):

!image.png|739x643

The wms module is a bit all over the place, it's just running a lot of
tests, but it seems a fair amount of time is spent
in security to perform password encryption/decryption... which does not
make a lot of sense, it's a test after all,
we should be running with plain text passwords no?

Looking into a main module profile we see again the digest at work, along
with Thread.sleep (those security tests with built-in waits)
and one test that's loading a very large mosaic (checking that we don't try
to open more than 1024 files actually):

!image.png|907x361

The netcdf-out module is a weird one... seems to be taking longer than
usual to setup the test data, but I'm not sure
why or how.

--

GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

*AVVERTENZE AI SENSI DEL D.Lgs. 196/2003*

Le informazioni contenute in questo messaggio di posta elettronica e/o
nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il
loro utilizzo è consentito esclusivamente al destinatario del messaggio,
per le finalità indicate nel messaggio stesso. Qualora riceviate questo
messaggio senza esserne il destinatario, Vi preghiamo cortesemente di
darcene notizia via e-mail e di procedere alla distruzione del messaggio
stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso,
divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od
utilizzarlo per finalità diverse, costituisce comportamento contrario ai
principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for
the attention and use of the named addressee(s) and may be confidential or
proprietary in nature or covered by the provisions of privacy act
(Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection
Code).Any use not in accord with its purpose, any disclosure, reproduction,
copying, distribution, or either dissemination, either whole or partial, is
strictly forbidden except previous formal approval of the named
addressee(s). If you are not the intended recipient, please contact
immediately the sender by telephone, fax or e-mail and delete the
information in this message that has been received in error. The sender
does not give any warranty or accept liability as the content, accuracy or
completeness of sent messages and accepts no responsibility for changes
made after they were sent or for other risks which arise as a result of
e-mail transmission, viruses, etc.

-------------------------------------------------------

Hi,
changes applied on master, the build servers generally liked it:

  • Ares from 46-54 minutes down to 37 minutes
  • Travis from 56-60 minutes down to 34 minutes
  • Winbuild from ~105 minutes down to 60 minutes

We should probably consider backporting and applying the same
to Geotools.
If there are no objections I’ll do that uh… sometimes this week
or next weekend

Cheers
Andrea

···

On Mon, Mar 28, 2016 at 11:43 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Hi,
I’m wondering, am I the only one that’s bothered by our current long build
times?

Around a year ago I made some effort to bring the build speed times
down to a saner level, shaving off 5-6 minutes, it seems during the
last year the build times have increased significantly again, bringing
the build time on my machine above the 20 minutes mark (for a “-T4 -Prelease”
build, a non parallel one is really close to the half an hour mark).
Now, sure, there are new modules in the build, but their extra time would
account for a minute, a minute and a half slowdown…

So I set out to check what’s going on, and found that git-commit-id
was sucking out a significant amount of build time, and a way to
reduce that:
https://github.com/geoserver/geoserver/pull/1542

This cuts the build time down to 19:30, not too bad, still quite far away.
I then run again the build from a year ago, building gt and gwc in the process,
and discovered that it takes a few minutes longer (16)… so something
apparently slowed down on my computer, or the software running on it.
Quite weird indeed, I have no explanation for this one (a bunch of
OS upgrades in the meantime, not much else…)

And then I’ve looked around on the internet and found this blog,
suggesting to use a JVM tuning know to avoid much of the
JIT compiling effort for short lived JVMs:
http://zeroturnaround.com/rebellabs/your-maven-build-is-slow-speed-it-up/

I was a bit skeptical but I tried it anyways, in a separate branch, and to my surprise
build time went down from 20:43 to 14:34… whoa…
Pull request here:
https://github.com/geoserver/geoserver/pull/1543

Combining the two I am down to around 12:40 for a -T4 build (tried
a -T1C, aka -T8, does not help significantly).
The lookout of the parallel build from a CPU usage p.o.v is that the
build is not really parallelized up until the main module build is complete,
and then goes up to full tilt for a long while, and finally has to wait with little
cpu usage for the security-jdbc module and app-schema integration modules,
that are keeping the build going for a good minute longer.

If you’re curious about detailed timings I shared the google-doc used
to investigate this:
https://docs.google.com/spreadsheets/d/1r2XrwGwCq0TTcAph-Kbe2SY1uy3X7-v9v8Ss5s4PPxU/edit#gid=0

The doc above contains also non parallel build timings, useful to check
how long the builds are for a single module, with some highlight of some
“bad” modules after the optimizations have been applied:

  • main, which is bad news since the real build parallelization starts only after this module is done building. The bad news is that like 30s of that build time are sucked by 3 security related tests, mostly waiting for timeouts… anyone want to give it a shot and see if we can do better there?
  • wms, which is running a lot of tests for sure, but hmmm… a bit fishy, will check it
  • security ui core module, which I’m afraid is in this sorry state because it cannot reuse data dirs across tests… still, having an eye at it would be really good
  • application schema integration tests, no clue, there are quite a bit of tests for sure, but the overall time seems too much
    Honorable mention for the necdf-out extension module, that sucks out almost as much time as a full service implementation, with a build time towering well above all of the other output format extensions… might be worth having a look at too.

Anyways, reviews of the pull requests and feedback are welcomed (and of course, more build speedup work, too!)

Cheers
Andrea

==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.

Ing. Andrea Aime

@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313

fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.