[SAC] [OSGeo] #2297: Drone (agent?) is failing to reach docker

#2297: Drone (agent?) is failing to reach docker
---------------------------+--------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Keywords:
---------------------------+--------------------------------------
See https://dronie.osgeo.org/postgis/postgis/240

Error message is:
pg-9.5: Cannot connect to the Docker daemon at
unix:///var/run/docker.sock. Is the docker daemon running?

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

Odd, one of the 3 matrix jobs did work in cloning (the pg10 one), in that
build.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:1&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by robe):

As mentioned on irc. This might have to do with my upgrade to 1.0.

The job that is succeeding is the drone that runs directly on dronie-
server. The other ones failing are run on ianna and debbie-docker.

It's hard to say if upgrade is the issue as lots of things were happening
at same time.

The reason why a matrix job fails is each job task of a matrix is being
passed to a different agent. For ow I've shut ffo the drones on ianna and
debbie-docker and restarted the failing job to see if that fixes the
issue.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:2&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

Shutting down ianna and debbie-docker seemed to have fixed the issue
for now. Let's keep this open until we put those agents back online
though, as builds are slower now...

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:3&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

I'm taking it back, the problem is still not fixed.
See https://dronie.osgeo.org/postgis/postgis/254/3/1

Any other agent connected to the server ?
Can you check the logs to tell, Regina ?

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:4&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

{{{ docker logs }}} doesn't show any detail about which agents are
connected from which IP addresses. The only visible information in the
logs is the error of a {{{ runner }}} with identifier {{{
"machine":"399d4d53c8f9" }}} that starts execution and fails.

I don't know why there's no trace about any other machines.

Logs of the agent docker contain info about both successful and failing
build, but w/out details of the failure, and no trace of the {{{
399d4d53c8f9 }}} identifier.

Where are the server and the agents configured ?

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:5&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by robe):

Details here - https://git.osgeo.org/gitea/sac/osgeo7/wiki/Dronie-Server-
container

I did turn off the others, so does seem to be some sort of dronie server
agent issue.

I suppose I can revert back to before I upgraded to 1.0 version (last was
rc5).

or I could just wipe out the database.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:6&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by robe):

Forgot to say not sure why it's even trying to reach the other agents when
I shut them down.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:7&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

Regina can you make me an administrator of the drone-1.0 server ? Chances
are there's some admin menu. How do you tell it is trying to access other
agents ?

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:8&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

The instructions on https://git.osgeo.org/gitea/sac/osgeo7/wiki/Dronie-
Server-container seem to say that the server and the agent are started via
{{{ lxc }}} but while on osgeo7 machine I see them running in docker {{{
docker ps }}} -- what's the deal ? Is there an overlap between {{{ lxc }}}
and {{{ docker }}} ? Or are we running two services in parallel ?

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:9&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

This is an interesting info to add to the wiki:
https://github.com/drone/drone/issues/1496

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:10&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

Thread on Drone support forum addressing this error:
https://discourse.drone.io/t/cannot-connect-to-the-docker-daemon-at-unix-
var-run-docker-sock-is-the-docker-daemon-running/4071

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:11&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: assigned
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+---------------------------------------

Comment (by strk):

Regina: you're missing DRONE_AGENTS_ENABLED=true to the server startup
script

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:12&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2297: Drone (agent?) is failing to reach docker
---------------------------+---------------------------------------
Reporter: strk | Owner: robe
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2019-I
Component: Systems Admin | Resolution: fixed
Keywords: |
---------------------------+---------------------------------------
Changes (by strk):

* status: assigned => closed
* resolution: => fixed

Comment:

Fix confirmed (was a missing DRONE_AGENTS_ENABLED variable)

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2297#comment:13&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.