[Geoserver-devel] Travis failures started with new trusty images

The Travis failures we are experiencing started with the rollout of new trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

We are not the only users having problems:
https://github.com/travis-ci/travis-ci/issues

On 18/12/17 12:29, Ben Caradoc-Davies wrote:

The Travis failures we are experiencing started with the rollout of new trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

I have been able to reproduce build hangs locally in src/community with:

mvn -B -T4 -U -Prelease -PcommunityRelease -DskipTests clean install

The hangs occur *only* with Maven 3.5.2, -U, and -T4.

- Maven 3.5.0 -> no hang
- Single threaded (no -T) - > no hang
- No -U (so reduced downloads) -> no hang

The latest Travis release updated to Maven 3.5.2, causing the problem. Travis is blameless, and my endless tinkering with travis_wait and other settings in .travis.yml was to no avail. This is a Maven problem.

jstack shows clear evidence of a deadlock during multi-threaded dependency resolution (attached).

The build eventually recovers (some of the locks are a TIMED__WAIT).

GeoTools does not use -U because it has no SNAPSHOT dependencies and is thus unaffected.

Removing -T is not an option because we have seen hangs in the main build as well and the single-threaded build is too slow.

Kind regards,
Ben.

On 18/12/17 12:41, Ben Caradoc-Davies wrote:

We are not the only users having problems:
https://github.com/travis-ci/travis-ci/issues

On 18/12/17 12:29, Ben Caradoc-Davies wrote:

The Travis failures we are experiencing started with the rollout of new trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

(attachments)

geoserver-community-maven-hang-jstack.txt (19.2 KB)

And here is another jstack while the build is hung at a different spot. There are no *.lock files in my local repository so the deadlock is not mediated by the filesystem.

Hung builds eventually complete after 30-45 minutes, but this is enough for Travis to notice their silence and kill them. Deadlock I think not livelock because zero CPU is being consumed.

On 18/12/17 21:42, Ben Caradoc-Davies wrote:

I have been able to reproduce build hangs locally in src/community with:

mvn -B -T4 -U -Prelease -PcommunityRelease -DskipTests clean install

The hangs occur *only* with Maven 3.5.2, -U, and -T4.

- Maven 3.5.0 -> no hang
- Single threaded (no -T) - > no hang
- No -U (so reduced downloads) -> no hang

The latest Travis release updated to Maven 3.5.2, causing the problem. Travis is blameless, and my endless tinkering with travis_wait and other settings in .travis.yml was to no avail. This is a Maven problem.

jstack shows clear evidence of a deadlock during multi-threaded dependency resolution (attached).

The build eventually recovers (some of the locks are a TIMED__WAIT).

GeoTools does not use -U because it has no SNAPSHOT dependencies and is thus unaffected.

Removing -T is not an option because we have seen hangs in the main build as well and the single-threaded build is too slow.

Kind regards,
Ben.

On 18/12/17 12:41, Ben Caradoc-Davies wrote:

We are not the only users having problems:
https://github.com/travis-ci/travis-ci/issues

On 18/12/17 12:29, Ben Caradoc-Davies wrote:

The Travis failures we are experiencing started with the rollout of new trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

(attachments)

geoserver-community-maven-hang-jstack-2.txt (28.1 KB)

Hi Ben,
thanks for the investigation!
Did you open a bug report?
Also, I’m wondering, can we use a custom version of Maven?

Cheers
Andrea

···

On Mon, Dec 18, 2017 at 9:42 AM, Ben Caradoc-Davies <ben@anonymised.com> wrote:

I have been able to reproduce build hangs locally in src/community with:

mvn -B -T4 -U -Prelease -PcommunityRelease -DskipTests clean install

The hangs occur only with Maven 3.5.2, -U, and -T4.

  • Maven 3.5.0 → no hang
  • Single threaded (no -T) - > no hang
  • No -U (so reduced downloads) → no hang

The latest Travis release updated to Maven 3.5.2, causing the problem. Travis is blameless, and my endless tinkering with travis_wait and other settings in .travis.yml was to no avail. This is a Maven problem.

jstack shows clear evidence of a deadlock during multi-threaded dependency resolution (attached).

The build eventually recovers (some of the locks are a TIMED__WAIT).

GeoTools does not use -U because it has no SNAPSHOT dependencies and is thus unaffected.

Removing -T is not an option because we have seen hangs in the main build as well and the single-threaded build is too slow.

Kind regards,
Ben.

On 18/12/17 12:41, Ben Caradoc-Davies wrote:

We are not the only users having problems:
https://github.com/travis-ci/travis-ci/issues

On 18/12/17 12:29, Ben Caradoc-Davies wrote:

The Travis failures we are experiencing started with the rollout of new trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,


Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/>
New Zealand


Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


Geoserver-devel mailing list
Geoserver-devel@anonymised.com.366…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

Andrea,

I have not yet filed a bug report.

I am testing a workaround to download and use Maven 3.5.0:
https://github.com/geoserver/geoserver/commit/a78d7c4711bdde1c853df523f5919a95cadd9359

The test Travis build is running on the debug-travis branch:
https://travis-ci.org/geoserver/geoserver/builds/317993402

Kind regards,
Ben.

On 18/12/17 21:50, Andrea Aime wrote:

Hi Ben,
thanks for the investigation!
Did you open a bug report?
Also, I'm wondering, can we use a custom version of Maven?

Cheers
Andrea

On Mon, Dec 18, 2017 at 9:42 AM, Ben Caradoc-Davies <ben@anonymised.com>
wrote:

I have been able to reproduce build hangs locally in src/community with:

mvn -B -T4 -U -Prelease -PcommunityRelease -DskipTests clean install

The hangs occur *only* with Maven 3.5.2, -U, and -T4.

- Maven 3.5.0 -> no hang
- Single threaded (no -T) - > no hang
- No -U (so reduced downloads) -> no hang

The latest Travis release updated to Maven 3.5.2, causing the problem.
Travis is blameless, and my endless tinkering with travis_wait and other
settings in .travis.yml was to no avail. This is a Maven problem.

jstack shows clear evidence of a deadlock during multi-threaded dependency
resolution (attached).

The build eventually recovers (some of the locks are a TIMED__WAIT).

GeoTools does not use -U because it has no SNAPSHOT dependencies and is
thus unaffected.

Removing -T is not an option because we have seen hangs in the main build
as well and the single-threaded build is too slow.

Kind regards,
Ben.

On 18/12/17 12:41, Ben Caradoc-Davies wrote:

We are not the only users having problems:
https://github.com/travis-ci/travis-ci/issues

On 18/12/17 12:29, Ben Caradoc-Davies wrote:

The Travis failures we are experiencing started with the rollout of new
trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

Andrea,

the test Travis build passed so I applied the fix to master, 2.12.x, and 2.11.x. I also went through open pull requests, found three which had failed with the timeout error, and closed-reopened each to start a new Travis build the latest settings.

GeoTools may also be vulnerable. The lack of -U used on Travis might only reduce the risk. If anyone sees one of these timeouts "No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.", please let everyone know.

I will report the bug to the Maven maintainers.

Kind regards,
Ben.

On 18/12/17 22:56, Ben Caradoc-Davies wrote:

Andrea,

I have not yet filed a bug report.

I am testing a workaround to download and use Maven 3.5.0:
https://github.com/geoserver/geoserver/commit/a78d7c4711bdde1c853df523f5919a95cadd9359

The test Travis build is running on the debug-travis branch:
https://travis-ci.org/geoserver/geoserver/builds/317993402

Kind regards,
Ben.

On 18/12/17 21:50, Andrea Aime wrote:

Hi Ben,
thanks for the investigation!
Did you open a bug report?
Also, I'm wondering, can we use a custom version of Maven?

Cheers
Andrea

On Mon, Dec 18, 2017 at 9:42 AM, Ben Caradoc-Davies <ben@anonymised.com>
wrote:

I have been able to reproduce build hangs locally in src/community with:

mvn -B -T4 -U -Prelease -PcommunityRelease -DskipTests clean install

The hangs occur *only* with Maven 3.5.2, -U, and -T4.

- Maven 3.5.0 -> no hang
- Single threaded (no -T) - > no hang
- No -U (so reduced downloads) -> no hang

The latest Travis release updated to Maven 3.5.2, causing the problem.
Travis is blameless, and my endless tinkering with travis_wait and other
settings in .travis.yml was to no avail. This is a Maven problem.

jstack shows clear evidence of a deadlock during multi-threaded dependency
resolution (attached).

The build eventually recovers (some of the locks are a TIMED__WAIT).

GeoTools does not use -U because it has no SNAPSHOT dependencies and is
thus unaffected.

Removing -T is not an option because we have seen hangs in the main build
as well and the single-threaded build is too slow.

Kind regards,
Ben.

On 18/12/17 12:41, Ben Caradoc-Davies wrote:

We are not the only users having problems:
https://github.com/travis-ci/travis-ci/issues

On 18/12/17 12:29, Ben Caradoc-Davies wrote:

The Travis failures we are experiencing started with the rollout of new
trusty images about six days ago:
https://blog.travis-ci.com/2017-12-12-new-trusty-images-q4-launch

Kind regards,

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

All six builds passed (three branches and three pull requests). Travis status is green.

On 18/12/17 23:39, Ben Caradoc-Davies wrote:

the test Travis build passed so I applied the fix to master, 2.12.x, and 2.11.x. I also went through open pull requests, found three which had failed with the timeout error, and closed-reopened each to start a new Travis build the latest settings.

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand

Awesome, thanks Ben!

Cheers
Andrea

···

On Mon, Dec 18, 2017 at 1:14 PM, Ben Caradoc-Davies <ben@anonymised.com> wrote:

All six builds passed (three branches and three pull requests). Travis status is green.

On 18/12/17 23:39, Ben Caradoc-Davies wrote:

the test Travis build passed so I applied the fix to master, 2.12.x, and 2.11.x. I also went through open pull requests, found three which had failed with the timeout error, and closed-reopened each to start a new Travis build the latest settings.


Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/>
New Zealand

Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

Thanks Ben, that explains some issues I was having Friday.

Torben

···

On Mon, Dec 18, 2017 at 5:21 AM, Andrea Aime <andrea.aime@anonymised.com> wrote:

Awesome, thanks Ben!

Cheers

Andrea


Check out the vibrant tech community on one of the world’s most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


Geoserver-devel mailing list
Geoserver-devel@anonymised.com.366…sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

On Mon, Dec 18, 2017 at 1:14 PM, Ben Caradoc-Davies <ben@anonymised.com> wrote:

All six builds passed (three branches and three pull requests). Travis status is green.

On 18/12/17 23:39, Ben Caradoc-Davies wrote:

the test Travis build passed so I applied the fix to master, 2.12.x, and 2.11.x. I also went through open pull requests, found three which had failed with the timeout error, and closed-reopened each to start a new Travis build the latest settings.


Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/>
New Zealand

Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054 Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy’s New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

Reported:

[MNG-6323] Deadlock in multithreaded dependency resolution
https://issues.apache.org/jira/browse/MNG-6323

On 18/12/17 21:50, Andrea Aime wrote:

Did you open a bug report?

--
Ben Caradoc-Davies <ben@anonymised.com>
Director
Transient Software Limited <https://transient.nz/&gt;
New Zealand