Damn Spam!
Search 

More stupid bots: BecomeBot and Miva

   

We had BecomeBot in our robots.txt for quite some time now. Despite the fact that they're not making it clear who's using the data for what, we simply are not a shopping site or in any way related to one. No grudge against them, but it just plain doesn't make sense for them to spider us.

Then I noticed that BecomeBot was still spidering us nonetheless. It looks like it, too, does not understand the "short" robots.txt format. So now our robots.txt lists each and every blocked bot separately:

User-agent: Gigabot
Disallow: /

User-agent: BecomeBot
Disallow: /

That seemed to help: It's now only requesting the robots.txt, but leaves everything else alone.

Then I saw this other bot, Miva. Again, something driven by a marketing company (or whatever). Nothing that really needs to spider our site (and increase our load). So into the robots.txt it went.

And today I noticed that it was happily reading the robots.txt - and then ignoring it. So I happily changed our .htaccess to block it. Goodbye, Miva ...

Btw, my preferred way to block a bot is now to send them a 410 ("Gone"), since the really stupid bots usually don't take 403 or 404 for an answer.

View Printable Version

Trackback

Trackback URL for this entry: http://spam.tinyweb.net/trackback.php/more-stupid-bots

No trackback comments for this entry.
More stupid bots: BecomeBot and Miva | 3 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
Fake BecomeBot

Just noticed what is apparently a faked BecomeBot coming from 69.41.173.145 (1-800-HOSTING, Inc., Dallas, Tx). It's not following the robots.txt (doesn't even touch it, actually).

Note that Become, Inc. have their own IP range (64.124.85.0 - 64.124.85.255), so the two are easy to tell apart. The real BecomeBot is behaving properly, as described above.

Authored by: Dirk on Monday, January 23 2006 @ 16:21 CET
BecomeBot DOES ingore robots.txt
64.124.85.74 - 64.124.85.74.become.com - and I have them in my robots.txt file...

User-agent: BecomeBot
Disallow:

Off to block their IP's
Authored by: Anonymous on Monday, August 28 2006 @ 00:22 CEST
BecomeBot DOES ingore robots.txt

There should be a slash after the "Disallow:" (as quoted in the article above). Otherwise, you're inviting them to spider your site.

As I wrote above, the real BecomeBot seems to be behaving properly as long as you use the "long" form of the robots.txt, i.e. with one Disallow: statement for each bot you want to block.

Authored by: Dirk on Monday, August 28 2006 @ 09:37 CEST

Copyright © 2008 Damn Spam! Powered By Geeklog