[SAC] Apache Hung

Folks,

Circa 6:45pm EST it was observed that OSGeo web services were not responding.
I confirmed that all apache servers seemed to be hung in some fashion
though the machine was not busy or otherwise resource constrained.

"netstat -a" showed quite a few connections, including 100 entries like:

tcp 101 0 osgeo1.osgeo.org:http 157-146.svr.royaume.c:50732 CLOSE_WAIT

I think this is Daniel's service checking machine. I kept the netstat
output, and it can be found at:

  /home/warmerdam/netstat.log

on osgeo1 if anyone wants to review it.

I have restarted apache and things seem to be working fine now.

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | President OSGeo, http://osgeo.org

Frank Warmerdam wrote:

Folks,

Circa 6:45pm EST it was observed that OSGeo web services were not responding.
I confirmed that all apache servers seemed to be hung in some fashion
though the machine was not busy or otherwise resource constrained.

"netstat -a" showed quite a few connections, including 100 entries like:

tcp 101 0 osgeo1.osgeo.org:http 157-146.svr.royaume.c:50732 CLOSE_WAIT

I think this is Daniel's service checking machine. I kept the netstat
output, and it can be found at:

Yes, that IP is my monitoring machine (which hits the server with wget once per minute). Do you want me to disable the monitoring to that server for now?

Daniel
--
Daniel Morissette
http://www.mapgears.com/

Daniel Morissette wrote:

Frank Warmerdam wrote:

Folks,

Circa 6:45pm EST it was observed that OSGeo web services were not responding.
I confirmed that all apache servers seemed to be hung in some fashion
though the machine was not busy or otherwise resource constrained.

"netstat -a" showed quite a few connections, including 100 entries like:

tcp 101 0 osgeo1.osgeo.org:http 157-146.svr.royaume.c:50732 CLOSE_WAIT

I think this is Daniel's service checking machine. I kept the netstat
output, and it can be found at:

Yes, that IP is my monitoring machine (which hits the server with wget once per minute). Do you want me to disable the monitoring to that server for now?

Daniel,

Well, I'd prefer to understand what is going on and fix it. The
monitoring service is quite useful!

Does anyone have an idea why these connections would get stuck in
the CLOSE_WAIT state?

I did a netstat -a just now, and the only requests from this machine
are:

[warmerdam@osgeo1 ~]$ netstat -a | grep roy
tcp 0 0 osgeo1.osgeo.org:http 157-146.svr.royaume.c:60092 TIME_WAIT
tcp 0 0 osgeo1.osgeo.org:http 157-146.svr.royaume.c:37789 TIME_WAIT

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | President OSGeo, http://osgeo.org

On 25-Mar-08, at 7:50 AM, Frank Warmerdam wrote:

Yes, that IP is my monitoring machine (which hits the server with wget once per minute). Do you want me to disable the monitoring to that server for now?

Is it just doing
wget http://www.osgeo.org or some specific file that is, perhaps, being locked from time to time?

Maybe an Apache update is needed?

Tyler Mitchell (OSGeo) wrote:

Is it just doing
wget http://www.osgeo.org or some specific file that is, perhaps, being locked from time to time?

It's hitting the home page with:

   wget -q -T 5 --delete-after http://www.osgeo.org

Maybe an Apache update is needed?

Dunno. I get the same behavior with the maptools.org download server when it comes back to life after it has stopped responding due to unfriendly download accelerators, so I doubt the problem is specific to the Apache version that you're running.

I suspect wget gets into some kind of retry loop when the server does not respond despite the "-T 5" arg that should tell it to give up after 5 seconds.

... man wget ...

I just found the "-t" option:

"""
   Set number of retries to number. Specify 0 or inf for infinite
   retrying. The default is to retry 20 times, with the exception of
   fatal errors like ‘‘connection refused’’ or ‘‘not found’’ (404),
   which are not retried.
"""

I'll add a "-t 1" arg to the wget command... let's see if that helps... or if that makes things worse.

Daniel
--
Daniel Morissette
http://www.mapgears.com/