[SAC] Munin notes

Due to the ongoing troubleshooting on Tracsvn I was paying a little more
attention to Munin.

1. We still need to decide where emails should be sent when munin hits a
warning or critical value (eg. disk is over 90%). I don't want to use
the Sac list because the number emails per incident can get astronomical
quickly (1 email per chart even if only 1 chart is out of bounds, repeat
every 5 minutes until issue is resolved). Do we have an svn watch type
list or something similar that sends out but isn't really for discussion?

2. I'd like to increase the ram allocation for the QGIS and the Projects
VMs. 2 GB+ to each, as both are using 70%+ of their current RAM on a
regular basis. Based on our notes we have 20 GB of unallocated RAM on
osgeo4 currently, so this should be no problem. Just wanted to get the
idea out before I ask osuosl to do it (might be able to do it with
ganeti web interface ourselves).

Thanks,
Alex

On Tue, Aug 21, 2012 at 2:56 PM, Alex Mandel <tech_dev@wildintellect.com> wrote:

Due to the ongoing troubleshooting on Tracsvn I was paying a little more
attention to Munin.

1. We still need to decide where emails should be sent when munin hits a
warning or critical value (eg. disk is over 90%). I don't want to use
the Sac list because the number emails per incident can get astronomical
quickly (1 email per chart even if only 1 chart is out of bounds, repeat
every 5 minutes until issue is resolved). Do we have an svn watch type
list or something similar that sends out but isn't really for discussion?

Alex,

Perhaps we could setup a sac-alert mailing list? I'm just a bit
concerned about alerting going nuts with lots of message and
also bogging down mailman, filling up disks, etc.

2. I'd like to increase the ram allocation for the QGIS and the Projects
VMs. 2 GB+ to each, as both are using 70%+ of their current RAM on a
regular basis. Based on our notes we have 20 GB of unallocated RAM on
osgeo4 currently, so this should be no problem. Just wanted to get the
idea out before I ask osuosl to do it (might be able to do it with
ganeti web interface ourselves).

I can live with this change, but it doesn't leave us much more room
if we want to spin up new machines or allocate more ram to existing
ones. What happens when the VMs go over the physical RAM?
Does it degrade gracefully with some sort of swap equivelent?

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Software Developer

On 08/21/2012 03:52 PM, Frank Warmerdam wrote:

On Tue, Aug 21, 2012 at 2:56 PM, Alex Mandel <tech_dev@wildintellect.com> wrote:

Due to the ongoing troubleshooting on Tracsvn I was paying a little more
attention to Munin.

1. We still need to decide where emails should be sent when munin hits a
warning or critical value (eg. disk is over 90%). I don't want to use
the Sac list because the number emails per incident can get astronomical
quickly (1 email per chart even if only 1 chart is out of bounds, repeat
every 5 minutes until issue is resolved). Do we have an svn watch type
list or something similar that sends out but isn't really for discussion?

Alex,

Perhaps we could setup a sac-alert mailing list? I'm just a bit
concerned about alerting going nuts with lots of message and
also bogging down mailman, filling up disks, etc.

sac-alert would work, as an exception we could host such a list with an
outside service. The emails aren't big per email, there's just a lot of
them when things go wrong.

2. I'd like to increase the ram allocation for the QGIS and the Projects
VMs. 2 GB+ to each, as both are using 70%+ of their current RAM on a
regular basis. Based on our notes we have 20 GB of unallocated RAM on
osgeo4 currently, so this should be no problem. Just wanted to get the
idea out before I ask osuosl to do it (might be able to do it with
ganeti web interface ourselves).

I can live with this change, but it doesn't leave us much more room
if we want to spin up new machines or allocate more ram to existing
ones. What happens when the VMs go over the physical RAM?
Does it degrade gracefully with some sort of swap equivelent?

Best regards,

Yes, it uses the swap partition inside the VM disk allocation. Graceful
depends on definition as going into swap sometimes means the machine
locks up for while until it can clear the swap or finish whatever it's
doing. A common experience is that the machine becomes unresponsive and
no one can log in to check on it, leaving the options of wait until it's
done (might be never if it's apache) or force a reboot.

We still would have 16 GB of ram for the host and for fail-over of
secure/web (8 reserved for that I think). I can audit to see if we can
take it away from other VMs that are underusing ram. Would adding 1 GB
be more palatable, no reason we have to do increments of 2.

Long term there's still the idea of getting build machine going
somewhere which would take the load off the qgis VM.

Thanks,
Alex

I don't want to use the Sac list because the number emails
per incident can get astronomical quickly

perhaps spam a common (but not too common) irc channel with the
alerts instead?

best,
Hamish