in Meddling, MT.Net, X-Geek

MSN can’t take no for an answer

Earlier this week I banned MSN’s msnbot from spidering my website. I did this with an entry in the robots.txt file:

User-Agent: msnbot
Disallow: /

I checked with MSN’s robots.txt verifier to make sure this would keep msnbot from spidering my site. The only problem is that I also blocked the MSN IP addresses. Thus msnbot couldn’t fetch robots.txt to tell it was no longer wanted.

So, I unblocked the IPs and allowed msnbot to grab the robots.txt file, which it did repeatedly (this is a small sample):

65.55.210.72 – – [28/Oct/2009:12:26:46 -0400] “GET /robots.txt HTTP/1.1” 200 923 “-” “msnbot/1.1 (+http://search.msn.com/msnbot.htm)”
65.55.210.65 – – [28/Oct/2009:12:28:04 -0400] “GET /robots.txt HTTP/1.1” 200 923 “-” “msnbot/1.1 (+http://search.msn.com/msnbot.htm)”
65.55.210.70 – – [28/Oct/2009:13:45:10 -0400] “GET /robots.txt HTTP/1.1” 200 850 “-” “msnbot/1.1 (+http://search.msn.com/msnbot.htm)”
65.55.207.123 – – [28/Oct/2009:13:45:23 -0400] “GET /robots.txt HTTP/1.1” 200 850 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”
65.55.37.203 – – [28/Oct/2009:14:04:58 -0400] “GET /robots.txt HTTP/1.1” 200 850 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”

Only, even after all of this Microsoft’s msnbot still insists on spidering my site. WTF?

65.55.51.112 – – [29/Oct/2009:10:47:45 -0400] “GET /2009/10/09/imagine-2/ HTTP/1.1” 403 307 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”
65.55.210.65 – – [29/Oct/2009:10:55:51 -0400] “GET /2009/10/15/big-names-in-sources-of-suspicious-traffic/comment-page-1/ HTTP/1.1” 403 355 “-” “msnbot/1.1 (+http://search.msn.com/msnbot.htm)”
65.55.37.202 – – [29/Oct/2009:11:03:53 -0400] “GET /robots.txt HTTP/1.1” 403 296 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”
65.55.210.65 – – [29/Oct/2009:11:04:57 -0400] “GET /2008/02/28/oh-lately-its-so-quiet/ HTTP/1.1” 403 320 “-” “msnbot/1.1 (+http://search.msn.com/msnbot.htm)”
65.55.37.202 – – [29/Oct/2009:11:05:52 -0400] “GET /tag/training/ HTTP/1.1” 403 299 “-” “msnbot/2.0b (+http://search.msn.com/msnbot.htm)”

I’ve blocked it again, and will think twice about reenabling it until Microsoft gets its act together.