[SAC] [OSGeo] #2894: Update of grass.osgeo.org to Debian 11

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+-----------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Unplanned
Component: Systems Admin | Keywords: debian
---------------------------+-----------------------
At time https://grass.osgeo.org/ is Debian GNU/Linux 10 (buster).

As it is a LXD container I am not sure how an update is to be done.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894&gt;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Changes (by robe):

* milestone: Unplanned => Sysadmin Contract 2023-I

Comment:

It swould follow the same process as any other server, but I can take care
of it if you want.

I think the last upgrade you had done yourself from debian 9 to debian 10.
But given I did run into issues with upgrading some other 10s to 11,
perhaps I should take care of it so I can roll back if issues.

I'll do a trial run on a back up of it on staging, and if looks good I'll
make the change here.

Usually doesn't take more than 2 hrs (of which at most about 1 hr
downtime).
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:1&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Comment (by robe):

I did a trial run upgrade of grass with a copy of it using ansible upgrade
script.

It seemed to go fine except somewhere along the line it ended with a dead
PostgreSQL 9.6 and an additional 13 main. pg_lsclusters shows this in my
staging container:

{{{
Ver Cluster Port Status Owner Data directory
Log file
9.6 main 5432 down,binaries_missing postgres
/var/lib/postgresql/9.6/main /var/log/postgresql/postgresql-9.6-main.log
11 main 5433 online postgres
/var/lib/postgresql/11/main /var/log/postgresql/postgresql-11-main.log
13 main 5434 online postgres
/var/lib/postgresql/13/main /var/log/postgresql/postgresql-13-main.log

}}}

and for current grass:

{{{
Ver Cluster Port Status Owner Data directory Log file
11 main 5433 online postgres /var/lib/postgresql/11/main
/var/log/postgresql/postgresql-11-main.log

}}}

None of the servers in either has a database on it though aside from the
default databases.
What are you using the postgresql for? Some ci stuff or you don't need it
at all and was installed accidentally perhaps when trying to install just
the clients?

I also see a mysql installed, and again there are no databases in it.

Those are fine to keep, but I'd rather remove services not being used to
minimize issues with future upgrades.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:2&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Comment (by robe):

I should note there are also some failed services, but I think those might
be an lxd issue and not related to the upgrade.

On upgraded (the grass-staging on osgeo4 which is a copy of prod one) (I
see 4 failed services)

{{{
  sudo systemctl list-units --state failed
}}}
shows:
{{{
   UNIT LOAD ACTIVE SUB DESCRIPTION
● binfmt-support.service loaded failed failed Enable support for
additional executable binary formats
● systemd-networkd.service loaded failed failed Network Service
● systemd-resolved.service loaded failed failed Network Name
Resolution
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket
● systemd-networkd.socket loaded failed failed Network Service
Netlink Socket
}}}

and on grass (current running)

{{{
   UNIT LOAD ACTIVE SUB DESCRIPTION
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

}}}

I'm pretty sure the current systemd-journal one is some permissions issue
in lxd which I'll investigate.

The one I upgraded, I hadn't checked to see before I started upgrading if
all those were failing. So it might very well be again permission issues
because osgeo4 is running a newer version of Ubuntu (Ubuntu 22.04) v.s
osgeo7 which is still on Ubuntu 20.04.

I'll review these failures before I do the upgrade on grass production.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:3&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Comment (by neteler):

Thanks for the trials!

The PostgreSQL and mySQL servers can be "brute-force" installed, we only
need them to compile GRASS with those as backends to get the manual pages
for these backends:

https://grass.osgeo.org/grass82/manuals/sql.html

A single (empty) installation of both would be perfect. We may also leave
that out and drop it for now; re-installing them at a later moment to
bring back the manual pages.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:4&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Comment (by robe):

I checked all the failure in the

{{{
   UNIT LOAD ACTIVE SUB DESCRIPTION
● binfmt-support.service loaded failed failed Enable support for
additional executable binary formats
● systemd-networkd.service loaded failed failed Network Service
● systemd-resolved.service loaded failed failed Network Name
Resolution
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket
● systemd-networkd.socket loaded failed failed Network Service
Netlink Socket
}}}

Those all are related to permission issues. It's unclear to me if you
actually need them though. But fix would be to make the container
privileged.

with

{{{
lxc config set grass security.nesting=true
}}}

That would fix all except the below which is already failing anyway in
prod, so I'll just remove that one.

{{{
systemd-journald-audit.socket loaded failed failed Journal Audit Socket
}}}

I'm going to start the upgrade process.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:5&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Comment (by robe):

Looks like it's still in the middle of upgrade. I should have disabled
cron before I started as it looks like your build job started running and
might be trying to use some python packages it was in middle of upgrading.

I'll let it run for another hour to see if it finishes and if not I'll
kill your running job.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:6&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution: fixed
Keywords: debian |
---------------------------+---------------------------------------
Changes (by robe):

* status: new => closed
* resolution: => fixed

Comment:

I canceled your jobs so I could complete the upgrade. I tried running
your

hugo_clean_and_update_job.sh but get error:

nice: ‘/usr/local/bin/hugo’: No such file or directory

I didn't check if that was missing before or result of changes.

After upgrade following showed as failing, just like they did in staging

{{{
● binfmt-support.service loaded failed failed Enable support for
additional executable binary formats
● modprobe@drm.service loaded failed failed Load Kernel Module
drm
● systemd-logind.service loaded failed failed User Login Management
● systemd-networkd.service loaded failed failed Network Service
● systemd-resolved.service loaded failed failed Network Name
Resolution
● systemd-journald-audit.socket loaded failed failed Journal Audit Socket
● systemd-networkd.socket loaded failed failed Network Service
Netlink Socket
}}}

I disabled or masked them so they don't show as failures:

{{{
systemctl disable binfmt-support.service
systemctl disable systemd-networkd-wait-online.service
systemctl disable systemd-journald-audit.socket
systemctl mask modprobe@drm.service
systemctl mask systemd-logind.service
systemctl disable systemd-networkd.service
systemctl disable systemd-resolved.service
systemctl mask systemd-journald-audit.socket
systemctl disable systemd-networkd.socket
}}}

Can you test out your jobs to make sure they all still work and see if you
see any other issues. Feel free to reopen if you still have issues.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:7&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution: fixed
Keywords: debian |
---------------------------+---------------------------------------
Comment (by neteler):

Thanks for updating the machine!

The remaining `hugo` issue is now addressed in
https://github.com/OSGeo/grass-addons/pull/875

Will monitor the server for potential other glitches (which I do not
expect).
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:8&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: reopened
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Changes (by neteler):

* status: closed => reopened
* resolution: fixed =>

Comment:

I discovered a problem:

rsync only works from inside the LXD container while from outside:

{{{
rsync --dry-run -avz --port=50026 grass.osgeo.org::grass-website grass-
website
rsync: [Receiver] getcwd(): Transport endpoint is not connected (107)
rsync error: errors selecting input/output files, dirs (code 3) at
util1.c(1122) [Receiver=3.2.7]
}}}

But a new phenomenon is also this, probably connected?

ssh grasslxd
shell-init: error retrieving current directory: getcwd: cannot access
parent directories: Transport endpoint is not connected

Two days ago (or so) I didn't not get this issue.
Maybe a problem on the jump host?
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:9&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: reopened
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution:
Keywords: debian |
---------------------------+---------------------------------------
Comment (by robe):

@neteler,

I'm not having an issue.

I did:

{{{
rsync --dry-run -avz --port=50026 grass.osgeo.org::grass-website grass-
website
}}}

from one of my servers (not on OSUOSL) and it works fine for me.

could it be maybe a port block issue on yourend?
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:10&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
---------------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: Systems Admin | Resolution: fixed
Keywords: debian |
---------------------------+---------------------------------------
Changes (by neteler):

* status: reopened => closed
* resolution: => fixed

Comment:

Magic, it also works now from here as well (and ports aren't blocked).

Closing again :slight_smile:
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2894#comment:11&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
----------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: SysAdmin | Resolution: fixed
Keywords: debian |
----------------------+---------------------------------------
Comment (by lnicola):

I don't think we should be disabling systemd-logind, that will probably
break D-Bus user sessions. If SSH is slow to log in, that's probably
Failed to set up mount namespacing: /run/systemd/unit-root/proc: Permission denied · Issue #17866 · systemd/systemd · GitHub.

I didn't look too closely, but it looks like a bad LXC/AppArmor
interaction that was fixed in bookworm in 2021: https://bugs.debian.org
/cgi-bin/bugreport.cgi?bug=995350).

Instead, I fixed it by setting `security.nested=true` on the container.
Apparently this is fine for unprivileged containers, but we should check
using `lxc list security.privileged=true` beforehand.

Of course, my preference would be to upgrade to a distro that has proper
support.
--
Ticket URL: <#2894 (Update of grass.osgeo.org to Debian 11) – OSGeo;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
----------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: SysAdmin | Resolution: fixed
Keywords: debian |
----------------------+---------------------------------------
Comment (by robe):

Replying to [comment:12 lnicola]:
> I don't think we should be disabling systemd-logind, that will probably
break D-Bus user sessions. If SSH is slow to log in, that's probably
Failed to set up mount namespacing: /run/systemd/unit-root/proc: Permission denied · Issue #17866 · systemd/systemd · GitHub.
>
> I didn't look too closely, but it looks like a bad LXC/AppArmor
interaction that was fixed in bookworm in 2021: https://bugs.debian.org
/cgi-bin/bugreport.cgi?bug=995350).
>
> Instead, I fixed it by setting `security.nested=true` on the container.
Apparently this is fine for unprivileged containers, but we should check
using `lxc list security.privileged=true` beforehand.
>
> Of course, my preference would be to upgrade to a distro that has proper
support.
>
> PS: we also ran into an `acpid` issue (it was using a lot of CPU), `apt
purge acpi-support` fixes that. Again, that might have been fixed, but a
container doesn't need ACPI anyway.

Thanks that did seem to fix the issue and the logrotate failing:

{{{

lxc config set grass security.nesting=true
lxc exec grass -- systemctl unmask systemd-logind.service

}}}
--
Ticket URL: <#2894 (Update of grass.osgeo.org to Debian 11) – OSGeo;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.

#2894: Update of grass.osgeo.org to Debian 11
----------------------+---------------------------------------
Reporter: neteler | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone: Sysadmin Contract 2023-I
Component: SysAdmin | Resolution: fixed
Keywords: debian |
----------------------+---------------------------------------
Comment (by robe):

Also did for grass-wiki though the logrotate coming back might have been
coincidental will see if it stays up but was dead again in grass-wiki.
--
Ticket URL: <#2894 (Update of grass.osgeo.org to Debian 11) – OSGeo;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.