We had BecomeBot in our robots.txt for quite some time now. Despite the fact that they're not making it clear who's using the data for what, we simply are not a shopping site or in any way related to one. No grudge against them, but it just plain doesn't make sense for them to spider us.
Then I noticed that BecomeBot was still spidering us nonetheless. It looks like it, too, does not understand the "short" robots.txt format. So now our robots.txt lists each and every blocked bot separately:
User-agent: Gigabot
Disallow: /
User-agent: BecomeBot
Disallow: /
That seemed to help: It's now only requesting the robots.txt, but leaves everything else alone.
Then I saw this other bot, Miva. Again, something driven by a marketing company (or whatever). Nothing that really needs to spider our site (and increase our load). So into the robots.txt it went.
And today I noticed that it was happily reading the robots.txt - and then ignoring it. So I happily changed our .htaccess to block it. Goodbye, Miva ...
Btw, my preferred way to block a bot is now to send them a 410 ("Gone"), since the really stupid bots usually don't take 403 or 404 for an answer.
Comments (0)
Damn Spam!
http://spam.tinyweb.net/article.php/more-stupid-bots