[SAC] Poorly Behaved Spiders, and a Dangerous Trap


We have had serious problems in recent days with load on www.osgeo.org which
I believe relates to our old friend - spiders pulling *huge* subversion
changesets out through Trac. This is already forbidden by the /robots.txt
so only poorly behaved spiders are doing this.

Per the suggestions at:

I have put a "spider trap" into place that should capture the IPs of
spiders ignoring the robots.txt and then use those IPs to forbid further
access to the trac.osgeo.org domain. Details are in the bug report at:


The IPs are recorded in:


Should trac.osgeo.org suddenly stop working for anyone, we should take a
peak in there to see if that is why.

Best regards,
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | President OSGeo, http://osgeo.org

Hash: SHA1


That's great. I also opened a ticket on this but, in relation to a
specific spider. Twiceler. I'm leaving this ticket open for a few days
and will close if i don't see twiceler returning.



Frank Warmerdam wrote:


We have had serious problems in recent days with load on www.osgeo.org
I believe relates to our old friend - spiders pulling *huge* subversion
changesets out through Trac. This is already forbidden by the /robots.txt
so only poorly behaved spiders are doing this.

Per the suggestions at:

I have put a "spider trap" into place that should capture the IPs of
spiders ignoring the robots.txt and then use those IPs to forbid further
access to the trac.osgeo.org domain. Details are in the bug report at:


The IPs are recorded in:


Should trac.osgeo.org suddenly stop working for anyone, we should take a
peak in there to see if that is why.

Best regards,

Version: GnuPG v1.4.6 (GNU/Linux)
