[SAC] LDAP troubles

Today tracsvn container cannot connect LDAP server.

The current configuration for LDAP client on that machine
is to use the public DNS name for the service (ldap.osgeo.org)
but attempts to reach that host on port 389 hangs indefinitely.
Hitting the host on port 636 is fine, with netcat:

  tracsvn:~# nc -vz ldap.osgeo.org 636
  DNS fwd/rev mismatch: ldap.osgeo.org != base.osgeo.osuosl.org
  ldap.osgeo.org [140.211.15.57] 636 (ldaps) open

But "can't contact" with ldapsearch:

  tracsvn:~# ldapsearch -H ldaps://ldap.osgeo.org:636 -x 'uid=strk'
  ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

The LXD configuration on osgeo7 requests to listen on port 636
for the ldap.osgeo.org IP (140.211.15.57) and connect it to port
636 of 127.0.0.1 of the "secure" container. Indeed I cannot contact
the server on that port from secure:

  secure:~# ldapsearch -H ldaps://127.0.0.1:636 -x 'uid=strk'
  ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

While I do can see the ports open (both 636 and 389):

  secure:~# netstat -tnlp | grep '\(389\|636\)'
  tcp 0 0 0.0.0.0:636 0.0.0.0:* LISTEN 29044/slapd
  tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 29044/slapd
  tcp6 0 0 :::636 :::* LISTEN 29044/slapd
  tcp6 0 0 :::389 :::* LISTEN 29044/slapd

Logs from the journal don't even see attempts to connect, but the
startup messages do contain some info about failures:

  secure:~# journalctl -x -u slapd.service -f
  Mar 02 08:30:05 secure systemd[1]: slapd.service: Failed to reset devices.list: Operation not permitted
  Mar 02 08:30:05 secure systemd[1]: slapd.service: Failed to set invocation ID on control group /system.slice/slapd.service, ignoring: Operation not permitted

Ever saw those messages? Ideas what could we be up to ?
Shall I blindly try a stop/start cycle on the LXD container ?

--strk;

  () Free GIS & Flash consultant/developer
  /\ https://strk.kbt.io/services.html

This got somehow fixed but I'm not sure if it was one of
my actions. What I did:

1. Run the /usr/local/bin/copy_ldap_certs_to_secure.sh
   script to update ssl certs if needed

2. Found out that slapd did not restart successfully due
   to wrong permissions of the certificates

3. Fixed certificates permissions and successfully restarted
   slapd

At the end of the above process things started to work again.

The permission tweaking addition to copy_ldap_certs_to_secure.sh
script I've created a pull request for (please review):

  https://git.osgeo.org/gitea/sac/ansible-deployment/pulls/8

Why the copy_ldap_certs_to_secure.sh script invocation was NOT
performed automatically from the crontab of tech_dev is yet
to be understood, and I ticketed it here:

  https://git.osgeo.org/gitea/sac/ansible-deployment/issues/9

Looking forward for the new sysadmin contract !

--strk;

On Tue, Mar 02, 2021 at 09:57:09AM +0100, Sandro Santilli wrote:

Today tracsvn container cannot connect LDAP server.

The current configuration for LDAP client on that machine
is to use the public DNS name for the service (ldap.osgeo.org)
but attempts to reach that host on port 389 hangs indefinitely.
Hitting the host on port 636 is fine, with netcat:

  tracsvn:~# nc -vz ldap.osgeo.org 636
  DNS fwd/rev mismatch: ldap.osgeo.org != base.osgeo.osuosl.org
  ldap.osgeo.org [140.211.15.57] 636 (ldaps) open

But "can't contact" with ldapsearch:

  tracsvn:~# ldapsearch -H ldaps://ldap.osgeo.org:636 -x 'uid=strk'
  ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

The LXD configuration on osgeo7 requests to listen on port 636
for the ldap.osgeo.org IP (140.211.15.57) and connect it to port
636 of 127.0.0.1 of the "secure" container. Indeed I cannot contact
the server on that port from secure:

  secure:~# ldapsearch -H ldaps://127.0.0.1:636 -x 'uid=strk'
  ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

While I do can see the ports open (both 636 and 389):

  secure:~# netstat -tnlp | grep '\(389\|636\)'
  tcp 0 0 0.0.0.0:636 0.0.0.0:* LISTEN 29044/slapd
  tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 29044/slapd
  tcp6 0 0 :::636 :::* LISTEN 29044/slapd
  tcp6 0 0 :::389 :::* LISTEN 29044/slapd

Logs from the journal don't even see attempts to connect, but the
startup messages do contain some info about failures:

  secure:~# journalctl -x -u slapd.service -f
  Mar 02 08:30:05 secure systemd[1]: slapd.service: Failed to reset devices.list: Operation not permitted
  Mar 02 08:30:05 secure systemd[1]: slapd.service: Failed to set invocation ID on control group /system.slice/slapd.service, ignoring: Operation not permitted

Ever saw those messages? Ideas what could we be up to ?
Shall I blindly try a stop/start cycle on the LXD container ?

--strk;