[SAC] Re: ! www.osgeo.org HTTP DOWN

Hi,

On Thu, Dec 20, 2007 at 03:13:25AM -0500, dmorissette@mail.dmsolutions.ca wrote:

HTTP DOWN for www.osgeo.org at Thu Dec 20 03:13:25 EST 2007

The machine was close to its swap limit and in order to verify my
assumption, to see if it's really related, I've been stopping 'httpd'
at about 03:36 LT. Suddenly swap usage went down from approx 1.7 GByte
to 400 MByte and the machine went on behaving nicely.

I'd say the number of 'httpd' processes should get limited. 135
processes, each of them consuming 40 to 50 MByte, is simply too much
for a machine that is equipped with 2 GByte of RAM.

Cheerio,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Thu, Dec 20, 2007 at 09:48:58AM +0100, Martin Spott wrote:

I'd say the number of 'httpd' processes should get limited. 135
processes, each of them consuming 40 to 50 MByte, is simply too much
for a machine that is equipped with 2 GByte of RAM.

To see, if this makes a solution, I've changed the MaxClients directive
for the prefork MPM from 150 to 50. Please have a look if this still
serves the goal,

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On 20.12.2007 10:59, Martin Spott wrote:

On Thu, Dec 20, 2007 at 09:48:58AM +0100, Martin Spott wrote:

I'd say the number of 'httpd' processes should get limited. 135
processes, each of them consuming 40 to 50 MByte, is simply too much
for a machine that is equipped with 2 GByte of RAM.

To see, if this makes a solution, I've changed the MaxClients directive
for the prefork MPM from 150 to 50. Please have a look if this still
serves the goal,

Is the server supposed to be up? I'm unable to connect...

--Wolf

--

<:3 )---- Wolf Bergenheim ----( 8:>

On 20.12.2007 11:04, Wolf Bergenheim wrote:

Is the server supposed to be up? I'm unable to connect...

Update: it works, but it's really really slow.

--Wolf

--

<:3 )---- Wolf Bergenheim ----( 8:>

Wolf Bergenheim wrote:

On 20.12.2007 10:59, Martin Spott wrote:

On Thu, Dec 20, 2007 at 09:48:58AM +0100, Martin Spott wrote:

I'd say the number of 'httpd' processes should get limited. 135
processes, each of them consuming 40 to 50 MByte, is simply too much
for a machine that is equipped with 2 GByte of RAM.

To see, if this makes a solution, I've changed the MaxClients directive
for the prefork MPM from 150 to 50. Please have a look if this still
serves the goal,

Is the server supposed to be up? I'm unable to connect...

Ah, you've guys noticed, great.

I think they are up but if they've run out of memory/disk space
you may have problems with ssh'ing on them. I've no idea if we have any
backdoors, have we?

Cheers
--
Mateusz Loskot
http://mateusz.loskot.net

On Thu, Dec 20, 2007 at 11:04:57AM +0200, Wolf Bergenheim wrote:

Is the server supposed to be up? I'm unable to connect...

I just verified the server is up,

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Thu, Dec 20, 2007 at 11:08:02AM +0200, Wolf Bergenheim wrote:

Update: it works, but it's really really slow.

In Europe it's always a bit slow :slight_smile:
Apparently the server has hit its process limit. We should find out if
there are really so many people pulling from the server of if some bots
or unwanted guests keep their connections open. At a first glance I'd
say there are really not many requests at the moment, the client
connections just don't get closed.
In order to make such overview a bit easier we'd probably add a logfile
where all the different virtual hosts are being logged, so you don't
have to monitor five or six different files simultaneously,

Cheers,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Martin Spott wrote:

On Thu, Dec 20, 2007 at 11:08:02AM +0200, Wolf Bergenheim wrote:

Update: it works, but it's really really slow.

In Europe it's always a bit slow :slight_smile:

For me, it's slow or it gives error, randomly.

Cheers
--
Mateusz Loskot
http://mateusz.loskot.net

On Thu, Dec 20, 2007 at 10:21:20AM +0100, Mateusz Loskot wrote:

For me, it's slow or it gives error, randomly.

In the meantime mysqldump has finished, the response times are much
better for me now .... :wink:

  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

On Thu, Dec 20, 2007 at 09:59:21AM +0100, Martin Spott wrote:

To see, if this makes a solution, I've changed the MaxClients directive
for the prefork MPM from 150 to 50. Please have a look if this still
serves the goal,

Adding to that, I've reduced the KeepAliveTimeout from 15 to 7 seconds,
as 7 seconds seems to be a good estimate for the time it takes to load
a single page.

I don't think that increasing the MaxClients number makes too much
sense here: The whole machine has a limited amount of RAM which then
has to be shared among MySQL, Apache, Postfix, Courier IMAP !?!?!?,
Mailman, ADSM (client), a local BIND9, several backup scripts and
others. Personally I'd say we'd better should have a server that
behaves a bit slow sometimes than a server which requires us to take
special care every third morning.

Cheerio,
  Martin.
--
Unix _IS_ user friendly - it's just selective about who its friends are !
--------------------------------------------------------------------------

Perhaps we should again look at whether there are any high-load or
resource-heavy services that could be redistributed to OSGeo2? MySQL?
Lists?

I've noticed slowness on Trac in the past couple days, and it's always
been early hours (I'm guessing between 12:00 and 2:00 in my TZ, GMT-8).
If this is the local backups, I wonder if they could be run from the
second server to reduce load on the primary? Also, I got errors about
the Trac DB being locked. I wonder if we need to specificly exclude
live database files from the filesystem backup?

I'm also wondering if this might be partially a result of the spiders
seeing all the cool new content under SVN repository and hammering it.
I know that I was initially a proponent of leaving SVN open to the
spiders, but... maybe it would be better to put that robots.txt back
into place.

Yeah, I know, throwing out suggestions without measuring sucks... I was
having load issues on one of my servers recently, and it turned out to
be a single IP hammering my MySQL-based forums.

Jason

-----Original Message-----
From: Martin Spott
Subject: [SAC] Re: ! www.osgeo.org HTTP DOWN

Adding to that, I've reduced the KeepAliveTimeout from 15 to 7 seconds,
as 7 seconds seems to be a good estimate for the time it takes to load a
single page.

I don't think that increasing the MaxClients number makes too much sense
here: The whole machine has a limited amount of RAM which then has to be
shared among MySQL, Apache, Postfix, Courier IMAP !?!?!?, Mailman, ADSM
(client), a local BIND9, several backup scripts and others. Personally
I'd say we'd better should have a server that behaves a bit slow
sometimes than a server which requires us to take special care every
third morning.