[SAC] Alert: lists.osgeo.org disk full

Hi all,

the silence in my inbox made me check the list server:
voilà - disk full!

1) I have made some emergency cleanup (apt-get clean; compressing some
   log files. Currently 98% disk full.

2) Stuff seems to come back.

3) we need urgendly more disk space there.

4) why is there no disk monitoring? Or, why does munin fail??

5) mailman-20120428.tar.gz in /home/martin/ should go elsewhere (5.4GB)

Cheers
Markus

On Tue, Dec 4, 2012 at 3:06 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi all,

the silence in my inbox made me check the list server:
voilà - disk full!

BTW: already for hours:

mail:/var/log# cat syslog | grep 'not enough free space' | head
Dec 3 14:13:12 ash postfix/smtpd[31970]: warning: not enough free
space in mail queue: 15179776 bytes < 1.5*message size limit
Dec 3 14:13:20 ash postfix/smtpd[32112]: warning: not enough free
space in mail queue: 15126528 bytes < 1.5*message size limit
Dec 3 14:13:21 ash postfix/smtpd[31879]: warning: not enough free
space in mail queue: 15118336 bytes < 1.5*message size limit
Dec 3 14:13:22 ash postfix/smtpd[31970]: warning: not enough free
space in mail queue: 15106048 bytes < 1.5*message size limit
Dec 3 14:13:31 ash postfix/smtpd[32112]: warning: not enough free
space in mail queue: 15056896 bytes < 1.5*message size limit
Dec 3 14:13:32 ash postfix/smtpd[31970]: warning: not enough free
space in mail queue: 15040512 bytes < 1.5*message size limit...

# TIME NOW:
mail:/var/log# date
Tue Dec 4 06:09:05 PST 2012

Re: munin. It is as simple as that lists.osgeo.org is not monitored at all...
http://webextra.osgeo.osuosl.org/munin/osgeo.org/lists.osgeo.org.html
-> 404

Please add munin also on that machine. I am not munin-savvy enough to
do it myself.

Thanks
Markus

Hi,

On Tue, Dec 04, 2012 at 03:06:53PM +0100, Markus Neteler wrote:

the silence in my inbox made me check the list server:
voilà - disk full!

I've moved the mailman backup file off-site.

Anyhow, I'm surprised how inefficient the beast is. It's having a 50
GByte disk, "df" claims that 41 GByte are used, 6.4 are actually
available - and "du" told me that only 33 GByte are actually stored in
files. This accounts for almost 1/3 overhead (10 GByte overhead at 33
GByte files) - quite impressing.

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Now that there's some disk space available, I'd vote for doing a clean
reboot to ensure all regular system services are running properly
again.

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On 12/04/2012 06:11 AM, Markus Neteler wrote:

On Tue, Dec 4, 2012 at 3:06 PM, Markus Neteler <neteler@osgeo.org> wrote:

Hi all,

the silence in my inbox made me check the list server:
voilà - disk full!

Re: munin. It is as simple as that lists.osgeo.org is not monitored at all...
http://webextra.osgeo.osuosl.org/munin/osgeo.org/lists.osgeo.org.html
-> 404

Please add munin also on that machine. I am not munin-savvy enough to
do it myself.

Thanks
Markus

Yes, I pointed out that munin was not setup on some of our VMs a few
weeks ago.

Also I never got feedback on what address should munin emails go to when
things go wrong. The list is a bad choice for that (especially if it's
down).

I can add lists (though I think there are still several machines where I
don't have sudo).

Thanks,
Alex

On 12/04/2012 07:47 AM, Martin Spott wrote:

Hi,

On Tue, Dec 04, 2012 at 03:06:53PM +0100, Markus Neteler wrote:

the silence in my inbox made me check the list server:
voilà - disk full!

I've moved the mailman backup file off-site.

Anyhow, I'm surprised how inefficient the beast is. It's having a 50
GByte disk, "df" claims that 41 GByte are used, 6.4 are actually
available - and "du" told me that only 33 GByte are actually stored in
files. This accounts for almost 1/3 overhead (10 GByte overhead at 33
GByte files) - quite impressing.

Cheers,
  Martin.

I'm not sure what the overhead is from, I can only suspect it's related
to how mailman works - everything else should be standard debian (though
I think it's ext3 not ext4).

We have more space we can allocate, how much do we want? 10G, 20G?

Thanks,
Alex

Hi Alex,

On Tue, Dec 04, 2012 at 08:34:33AM -0800, Alex Mandel wrote:

Also I never got feedback on what address should munin emails go to when
things go wrong. The list is a bad choice for that (especially if it's
down).

Mailing Primary Admins by their individual EMail addresses might be a
reasonable choice ?

I can add lists (though I think there are still several machines where I
don't have sudo).

As you're a Primary Admin, I suspect you should - somehow - be able to
log into every machine "we" (TM) maintain. Don't you ?

Cheerio,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Tue, Dec 04, 2012 at 08:38:12AM -0800, Alex Mandel wrote:

I'm not sure what the overhead is from, I can only suspect it's related
to how mailman works - [...]

It's probably related to how the filesystem works :slight_smile:
Sure, the Mailman archive contains many files, at least on per EMail in
the archive, but 1/3 overhead is still a lot, I'd say.

I have no idea how this KVM virtualization is organized. Are the VM's
sitting inside LV's which could easily be grown ? Anyhow, the current
Mailman archive is slightly less than 30 GByte net, thus I'd say let's
have another 20 GByte so we can take distro packages and other stuff
into account and are still on the safe side for a while.

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On 12/04/2012 08:51 AM, Martin Spott wrote:

On Tue, Dec 04, 2012 at 08:38:12AM -0800, Alex Mandel wrote:

I'm not sure what the overhead is from, I can only suspect it's related
to how mailman works - [...]

It's probably related to how the filesystem works :slight_smile:
Sure, the Mailman archive contains many files, at least on per EMail in
the archive, but 1/3 overhead is still a lot, I'd say.

I have no idea how this KVM virtualization is organized. Are the VM's
sitting inside LV's which could easily be grown ? Anyhow, the current
Mailman archive is slightly less than 30 GByte net, thus I'd say let's
have another 20 GByte so we can take distro packages and other stuff
into account and are still on the safe side for a while.

Cheers,
  Martin.

Yes, the VMs are LVs. I simply need to request OSUOSL does the grow.
Should be about 5-10 minutes of downtime to do. We have ~200 GB
available on osgeo4 if the tables are up to date.

About the munin emails, I checked and there's a bug right now that's
being worked on because munin can only send to 1 email address without
causing trouble. On my networks I use a 3rd party email list with the
primary admins and send to that. Would that idea work for people here?

Thanks,
Alex

On Tue, Dec 04, 2012 at 09:00:01AM -0800, Alex Mandel wrote:

About the munin emails, I checked and there's a bug right now that's
being worked on because munin can only send to 1 email address without
causing trouble. On my networks I use a 3rd party email list with the
primary admins and send to that. Would that idea work for people here?

"3rd party email list" being a simple /etc/aliases entry having
multiple recipients ?

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Tue, Dec 4, 2012 at 4:49 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

Now that there's some disk space available, I'd vote for doing a clean
reboot to ensure all regular system services are running properly
again.

+1

Otherwise we might run into extra troubles.

Markus

On Tue, Dec 04, 2012 at 06:24:21PM +0100, Markus Neteler wrote:

On Tue, Dec 4, 2012 at 4:49 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

> Now that there's some disk space available, I'd vote for doing a clean
> reboot to ensure all regular system services are running properly
> again.

+1

Go ahead :wink:

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Hi all,

On Tue, Dec 4, 2012 at 5:45 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

Hi Alex,

On Tue, Dec 04, 2012 at 08:34:33AM -0800, Alex Mandel wrote:

Also I never got feedback on what address should munin emails go to when
things go wrong. The list is a bad choice for that (especially if it's
down).

Fully agreed.

Mailing Primary Admins by their individual EMail addresses might be a
reasonable choice ?

Yes, via /etc/aliases will be sufficient.

I can add lists (though I think there are still several machines where I
don't have sudo).

As you're a Primary Admin, I suspect you should - somehow - be able to
log into every machine "we" (TM) maintain. Don't you ?

I agree - strange that you don't have sudo everywhere. Perhaps it needs
to be added for you on the non-LDAP machines?

ciao
Markus

On Tue, Dec 4, 2012 at 6:32 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

On Tue, Dec 04, 2012 at 06:24:21PM +0100, Markus Neteler wrote:

On Tue, Dec 4, 2012 at 4:49 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

> Now that there's some disk space available, I'd vote for doing a clean
> reboot to ensure all regular system services are running properly
> again.

+1

Go ahead :wink:

Done, server is back online.

Markus

On Tue, Dec 04, 2012 at 07:37:43PM +0100, Markus Neteler wrote:

Done, server is back online.

Great - at the same time you've activated a minor kernel upgrade :slight_smile:

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Tue, Dec 4, 2012 at 7:59 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

On Tue, Dec 04, 2012 at 07:37:43PM +0100, Markus Neteler wrote:

Done, server is back online.

Great - at the same time you've activated a minor kernel upgrade :slight_smile:

Cool - hope we wanted that.

Markus

On 12/04/2012 09:39 AM, Markus Neteler wrote:

Hi all,

On Tue, Dec 4, 2012 at 5:45 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

Hi Alex,

On Tue, Dec 04, 2012 at 08:34:33AM -0800, Alex Mandel wrote:

Also I never got feedback on what address should munin emails go to when
things go wrong. The list is a bad choice for that (especially if it's
down).

Fully agreed.

Mailing Primary Admins by their individual EMail addresses might be a
reasonable choice ?

Yes, via /etc/aliases will be sufficient.

Ok, I added an alias, though I'm not sure what to put in the munin.conf
to use that mailing alias. I put me, Martin, Markus and Frank in - if
others want let me know. FYI, Markus do you have a non osgeo address you
want sent to, in case the osgeo.org mail machine can't forward if it's down?

Turns out that I had already configured munin to email me directly but I
haven't received any emails. So I'm not sure sendmail is setup to
actually send out. Can someone check that on webextra.

I can add lists (though I think there are still several machines where I
don't have sudo).

As you're a Primary Admin, I suspect you should - somehow - be able to
log into every machine "we" (TM) maintain. Don't you ?

I agree - strange that you don't have sudo everywhere. Perhaps it needs
to be added for you on the non-LDAP machines?

ciao
Markus

sudo is not set via LDAP (at least I don't think it is), so I have shell
to all the machines but not sudo. I'll make a list so someone can go
through and add me to machines I don't have already.

Thanks,
Alex

On Tue, Dec 4, 2012 at 8:12 PM, Alex Mandel <tech_dev@wildintellect.com> wrote:
...

Ok, I added an alias, though I'm not sure what to put in the munin.conf
to use that mailing alias. I put me, Martin, Markus and Frank in - if
others want let me know. FYI, Markus do you have a non osgeo address you
want sent to, in case the osgeo.org mail machine can't forward if it's down?

Good point, please use
neteler.osgeo a t gmail.com

(no real idea about the other questions)

thanks
Markus

On 12/04/2012 11:12 AM, Alex Mandel wrote:

On 12/04/2012 09:39 AM, Markus Neteler wrote:

Hi all,

On Tue, Dec 4, 2012 at 5:45 PM, Martin Spott <Martin.Spott@mgras.net> wrote:

Hi Alex,

On Tue, Dec 04, 2012 at 08:34:33AM -0800, Alex Mandel wrote:

Also I never got feedback on what address should munin emails go to when
things go wrong. The list is a bad choice for that (especially if it's
down).

Fully agreed.

Mailing Primary Admins by their individual EMail addresses might be a
reasonable choice ?

Yes, via /etc/aliases will be sufficient.

Ok, I added an alias, though I'm not sure what to put in the munin.conf
to use that mailing alias. I put me, Martin, Markus and Frank in - if
others want let me know. FYI, Markus do you have a non osgeo address you
want sent to, in case the osgeo.org mail machine can't forward if it's down?

Turns out that I had already configured munin to email me directly but I
haven't received any emails. So I'm not sure sendmail is setup to
actually send out. Can someone check that on webextra.

I can add lists (though I think there are still several machines where I
don't have sudo).

As you're a Primary Admin, I suspect you should - somehow - be able to
log into every machine "we" (TM) maintain. Don't you ?

I agree - strange that you don't have sudo everywhere. Perhaps it needs
to be added for you on the non-LDAP machines?

ciao
Markus

sudo is not set via LDAP (at least I don't think it is), so I have shell
to all the machines but not sudo. I'll make a list so someone can go
through and add me to machines I don't have already.

Thanks,
Alex

I've narrowed down the issue to postfix. It was installed, to check I
reconfigured. For some reason it does not turn on with sudo
/etc/init.d/postfix start

I can't find anything in the logs. Once that's solved I can get the rest
to work.

Thanks,
Alex