[OSGeo] #3473: discourse is down

#3473: discourse is down
--------------------------------+---------------------------
Reporter: robe | Owner: sac-tickets@…
     Type: task | Status: new
Priority: normal | Milestone: 2025 (robe)
Component: SysAdmin/Discourse | Keywords:
--------------------------------+---------------------------
As noted on general matrix
--
Ticket URL: <Making sure you're not a bot!;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.

#3473: discourse is down
--------------------------------+----------------------------
Reporter: robe | Owner: sac-tickets@…
     Type: task | Status: new
Priority: normal | Milestone: 2025 (robe)
Component: SysAdmin/Discourse | Resolution:
Keywords: |
--------------------------------+----------------------------
Comment (by robe):

Seems to have run out of disk space. Checking to see where all the disk
space went.
--
Ticket URL: <Making sure you're not a bot!;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.

#3473: discourse is down
--------------------------------+----------------------------
Reporter: robe | Owner: sac-tickets@…
     Type: task | Status: closed
Priority: normal | Milestone: 2025 (robe)
Component: SysAdmin/Discourse | Resolution: fixed
Keywords: |
--------------------------------+----------------------------
Changes (by robe):

* resolution: => fixed
* status: new => closed

Comment:

I deleted some snapshots but that wasn't enough to clear it and then I
added another 50GB of space to bring it to 250GB (that includes snapshot
space0.

So I rebooted to make sure no locked files. When it came back it only had
4MB.

{{{
root@discourse:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 181G 181G 4.3M 100% /
tmpfs 7.9G 0 7.9G 0% /dev/shm
tmpfs 3.2G 536K 3.2G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 50M 14M 37M 28% /run/lxd_agent
/dev/sda15 105M 6.1M 99M 6% /boot/efi

}}}

So I went searching for huge spaces.

The biggest folder of use was:

{{{
root@discourse:/var/discourse/shared/standalone# du -h -d 1
1.4G ./import
15G ./postgres_data
312M ./log
12M ./redis_data
11G ./backups
4.0K ./postgres_backup
1.2G ./uploads
12K ./tmp
28K ./state
20K ./postgres_run
14G ./postgres_data_old
42G .

}}}

Which as you see isn't anywhere near the 188G that was supposedly in use.

So I purged the postgres_data_old which was left over from when we
upgraded the postgres from pg13.
After that

{{{
Filesystem Size Used Avail Use% Mounted on
/dev/root 226G 63G 164G 28% /

}}}

So I'm assuming my removal of the folder had nothing to do with this and
it's something in docker which cleared up when I rebooted but is still
catching up. Last check shows this

{{{
Filesystem Size Used Avail Use% Mounted on
/dev/root 226G 63G 164G 28% /
tmpfs 7.9G 0 7.9G 0% /dev/shm
tmpfs 3.2G 836K 3.2G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 50M 14M 37M 28% /run/lxd_agent
/dev/sda15 105M 6.1M 99M 6% /boot/efi
overlay 226G 63G 164G 28%
/var/lib/docker/overlay2/119b6e89c3c7a2a9b23f2461ad45ba0db57ec4a59ac760747e874ef4f8918fd3/merged
overlay 226G 63G 164G 28%
/var/lib/docker/overlay2/7ea288bc009aed6683e182489a8fdb779357fe8d1e9e185b12656a4bccabe2a8/merged

}}}

I'm closing for now.
--
Ticket URL: <Making sure you're not a bot!;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.