[SAC] [OSGeo] #277: Robots Are Attacking!

#277: Robots Are Attacking!
----------------------+-----------------------------------------------------
Reporter: warmerdam | Owner: sac@lists.osgeo.org
    Type: task | Status: new
Priority: normal | Component: SAC
Keywords: trac |
----------------------+-----------------------------------------------------
Today we were able to catch one of our load spikes in action. The server-
status report indicated:
{{{
Srv PID Acc M CPU SS Req Conn Child
Slot Client VHost Request
0-0 28743 0/909/2477 W 162.91 92 0 0.0
10.29 21.07 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
1-0 2426 0/44/1752 W 11.97 133 0 0.0
2.51 23.90 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
2-0 2876 0/2/1699 W 1.35 120 0 0.0
0.01 14.26 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
3-0 2880 0/8/2075 W 3.45 77 0 0.0
0.01 91.60 70.91.111.164 trac.osgeo.org GET /gdal/log/ HTTP/1.0
4-0 2882 0/11/2494 W 4.84 0 0 0.0
0.14 32.74 70.91.111.164 trac.osgeo.org GET
/gdal/log/sandbox/ajolma/swig HTTP/1.0
5-0 2883 0/6/1292 W 1.81 10 0 0.0
0.03 17.24 70.91.111.164 trac.osgeo.org GET
/gdal/log/trunk?rev=14376 HTTP/1.0
6-0 540 0/279/952 W 53.38 109 0 0.0
6.77 14.25 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
7-0 543 0/276/1812 W 55.07 109 0 0.0
2.62 14.39 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
8-0 20939 0/2031/2508 W 390.31 200 0 0.0
25.80 30.36 198.253.49.6 trac.osgeo.org GET
/ossim/doxygen/classossimImageData.html HTTP/1.1
9-0 2890 0/20/2507 W 4.27 5 0 0.0
0.27 14.41 74.6.22.97 trac.osgeo.org GET
/fdo/wiki/WikiFormatting HTTP/1.0
10-0 2893 0/0/1744 W 181.85 101 0 0.0
0.00 42.93 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
11-0 26129 0/1332/1966 W 212.63 0 0 0.0
9.06 25.59 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
12-0 546 0/277/785 W 56.27 115 0 0.0
1.47 5.29 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
13-0 2895 0/18/609 W 4.95 0 0 0.0
0.44 5.58 67.195.37.123 osgeo1.osgeo.org GET
/switchuilocale/id?destination=node%2F723 HTTP/1.0
14-0 548 0/283/982 W 59.52 74 0 0.0
1.76 7.41 70.91.111.164 trac.osgeo.org GET /gdal/log/trunk
HTTP/1.0
15-0 2896 0/0/591 W 34.98 96 0 0.0 0.00
4.39 70.91.111.164 trac.osgeo.org GET /gdal/log/branches/1.4
HTTP/1.0
16-0 2897 0/7/733 W 3.37 0 0 0.0 0.18
5.03 209.169.157.146 osgeo1.osgeo.org GET / HTTP/1.0
17-0 551 0/273/2312 W 49.57 128 0 0.0
5.62 26.73 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
18-0 552 0/262/1491 W 44.07 127 0 0.0
1.06 22.71 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
19-0 2898 0/9/295 W 4.06 0 0 0.0 0.22
2.45 70.91.111.164 trac.osgeo.org GET
/gdal/browser/sandbox/crschmidt?order=size HTTP/1.0
20-0 2899 0/5/433 W 1.43 20 0 0.0 0.10
2.69 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
21-0 20959 0/2073/2346 W 382.95 9 0 0.0
27.21 28.29 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
22-0 2900 0/3/456 W 1.17 20 0 0.0 0.08
2.68 70.91.111.164 trac.osgeo.org GET /gdal/log/trunk?rev=14376
HTTP/1.0
23-0 20966 0/2043/2121 W 362.15 3 0 0.0
39.28 40.03 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
24-0 2901 0/1/377 W 0.00 94 0 0.0 0.000
5.31 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
25-0 20968 0/2090/2137 W 406.93 1 0 0.0
48.82 49.00 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13273/sandbox/crschmidt HTTP/1.0
26-0 2904 0/9/209 W 3.43 2 0 0.0 0.15
0.97 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/11871/sandbox/hobu HTTP/1.0
27-0 558 0/265/519 W 54.33 116 0 0.0
1.25 3.52 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
28-0 559 0/282/438 W 46.89 77 0 0.0
2.26 4.42 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
29-0 20982 0/2112/2125 W 394.18 1 0 0.0
22.55 22.84 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/11871/sandbox/hobu HTTP/1.0
30-0 2906 0/22/79 W 6.23 0 0 0.0 0.44
1.24 74.6.18.233 osgeo1.osgeo.org GET /pipermail/mapserver-
users/2003-December/047445.html HTTP/1
31-0 2907 0/12/1450 W 2.25 58 0 0.0
0.09 8.68 74.6.22.97 trac.osgeo.org GET
/grass/query?status=new&status=assigned&status=reopened&mil
32-0 19340 0/2268/2293 W 429.97 78 0 0.0
29.11 29.57 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
33-0 2910 0/15/177 W 5.81 8 0 0.0
0.20 1.93 70.91.111.164 trac.osgeo.org GET
/gdal/log/trunk?rev=14376 HTTP/1.0
34-0 2911 0/10/642 W 2.71 0 0 0.0
0.36 4.87 24.61.22.108 trac.osgeo.org GET /mapguide/ HTTP/1.1
35-0 19351 0/2075/2088 W 567.14 102 0 0.0
143.37 143.43 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.5?old_path=%2f&format=
36-0 2912 0/5/2090 W 2.28 2 0 0.0
0.21 22.41 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13273/sandbox/crschmidt HTTP/1.0
37-0 2913 0/11/1972 W 3.02 0 0 0.0
0.15 14.52 209.85.238.11 trac.osgeo.org GET
/gdal/timeline?milestone=on&ticket=on&changeset=on&wiki=on&
38-0 20988 0/2101/2118 W 369.99 139 0 0.0
38.18 38.77 192.5.156.252 svn.osgeo.org REPORT
/ossim/!svn/vcc/default HTTP/1.1
39-0 2914 0/9/219 W 3.37 7 0 0.0 0.16
1.77 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
40-0 2915 0/10/18 W 3.79 15 0 0.0 0.21
0.79 70.91.111.164 trac.osgeo.org GET /gdal/log/trunk?rev=14376
HTTP/1.0
41-0 2916 0/13/81 W 3.42 7 0 0.0 0.07
0.42 74.6.22.97 trac.osgeo.org GET
/grass/query?status=new&status=assigned&status=reopened&mil
42-0 2917 0/8/20 W 2.45 0 0 0.0 0.23
0.79 72.171.0.144 trac.osgeo.org GET /server-status HTTP/1.1
43-0 2918 0/10/39 W 3.26 10 0 0.0 0.21
0.92 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
44-0 2919 0/10/160 W 5.24 7 0 0.0
0.25 1.13 70.91.111.164 trac.osgeo.org GET
/gdal/log/trunk?rev=14376 HTTP/1.0
45-0 2920 0/9/54 W 2.17 15 0 0.0 0.11
0.61 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/13196/sandbox/ajolma HTTP/1.0
46-0 18139 0/2315/2315 W 539.51 123 0 0.0
153.35 153.35 70.91.111.164 trac.osgeo.org GET
/gdal/changeset/14384/branches/1.4?old_path=%2f&format=
47-0 2921 0/22/50 W 5.08 0 0 0.0 0.29
0.87 70.91.111.164 trac.osgeo.org GET
/gdal/browser/sandbox/crschmidt?order=date HTTP/1.0
48-0 2928 0/0/100 W 8.54 88 0 0.0 0.00
0.36 70.91.111.164 trac.osgeo.org GET /gdal/log/ HTTP/1.0
49-0 2929 0/2/90 W 0.87 76 0 0.0 0.01
0.96 70.91.111.164 trac.osgeo.org GET /gdal/log/ HTTP/1.0
}}}

Of note is that we were getting massive hits (at about 5 requests per
second) from a robot against Trac for changesets and trac was not able to
keep up -- possibly because the client was unable to consume the results
we were sending back fast enough.

It is proposed that we put in place "maximum ip per connection" limits on
trac.osgeo.org, similar to what we did on download.osgeo.org for #216.

--
Ticket URL: <http://trac.osgeo.org/osgeo/ticket/277&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#277: Robots Are Attacking!
------------------------+---------------------------------------------------
  Reporter: warmerdam | Owner: sac@lists.osgeo.org
      Type: task | Status: new
  Priority: normal | Component: SAC
Resolution: | Keywords: trac
------------------------+---------------------------------------------------
Comment (by crschmidt):

1. I've turned on "Combined" (instead of 'common') logging for Trac, so
that we can see if bots are sending user-agents that indicate contact
information in the future if this happens.

2. I've installed httpd-devel so that I can get the apxs binary. (up2date
-i httpd-devel)

3. I've downloaded and installed limitipconn:

{{{
wget http://dominia.org/djao/limit/mod_limitipconn-0.23.tar.bz2
cd mod_limitipconn-0.23
sudo make install
}}}

4. Set MaxConnPerIP 1, restarted apache, confirmed that reloading the gdal
trac page resulted in a couple 503s. Set MaxConnPerIP to 8, reloaded, and
confirmed no 503s.

This matches the default of '8' max server connections in Firefox
about:config on my mac.

We may want to apply this to other services if we see other problems like
this occuring: For now, I'd like to leave it on trac only and see what
happens.

--
Ticket URL: <http://trac.osgeo.org/osgeo/ticket/277#comment:1&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#277: Robots Are Attacking!
------------------------+---------------------------------------------------
  Reporter: warmerdam | Owner: sac@lists.osgeo.org
      Type: task | Status: new
  Priority: normal | Component: SAC
Resolution: | Keywords: trac
------------------------+---------------------------------------------------
Comment (by jbirch):

I wonder if it would be worth setting crawl-delay for the major spiders?

Yahoo and Microsoft support this directive in robots.txt, while for Google
you have to set up a Webmasters Tools account and tell it to slow down in
there.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/277#comment:2&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#277: Robots Are Attacking!
------------------------+---------------------------------------------------
  Reporter: warmerdam | Owner: sac@lists.osgeo.org
      Type: task | Status: new
  Priority: normal | Component: SAC
Resolution: | Keywords: trac
------------------------+---------------------------------------------------
Comment (by crschmidt):

Jason:

What problem are you trying to solve? The 'crawler' causing problems in
this case was crawling from a comcast internet connection: clearly not one
of the 'big 3' search spiders, which are typically well behaved, according
to all of my log-reading and observations.

Anything that opens 45 different connections to your server at once is
simply a broken crawler, in my mind, no questions asked.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/277#comment:3&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#277: Robots Are Attacking!
------------------------+---------------------------------------------------
  Reporter: warmerdam | Owner: sac@lists.osgeo.org
      Type: task | Status: new
  Priority: normal | Component: SAC
Resolution: | Keywords: trac
------------------------+---------------------------------------------------
Comment (by jbirch):

I guess that answers my question :slight_smile:

I'm wasn't trying to solve a particular problem; you have dealt with that
nicely. Just wondering if setting those values would help conserve server
resources in general.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/277#comment:4&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#277: Robots Are Attacking!
------------------------+---------------------------------------------------
  Reporter: warmerdam | Owner: sac@lists.osgeo.org
      Type: task | Status: new
  Priority: normal | Component: SAC
Resolution: | Keywords: trac
------------------------+---------------------------------------------------
Comment (by crschmidt):

Yeah. In general, well-behaved bots are not a problem (so far as I can
observe) -- only poorly behaved bots which would ignore our "please be
polite" requests anyway.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/277#comment:5&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#277: Robots Are Attacking!
---------------------------+---------------------
Reporter: warmerdam | Owner: sac@…
     Type: task | Status: closed
Priority: normal | Milestone:
Component: Systems Admin | Resolution: fixed
Keywords: trac |
---------------------------+---------------------
Changes (by neteler):

* status: new => closed
* resolution: => fixed

Comment:

Since we even kind of survive the actual spam storm, closing.

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/277#comment:6&gt;
OSGeo <http://www.osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.