On one of my sites, I stumbled across yet another bot that ignored the robots.txt: CupriBOT. The twist is that they openly admit that it's ignoring the robots.txt. Their FAQ is in German, so here's a rough translation:
Why does the robot ignore my robots.txt?
The webspider would like to see the pages as any user sees them when they call up the URL. The robots.txt file is not mandatory and not relevant for the content it's looking for. The effort for interpreting the robots.txt would be high and would have an impact on the performance of the programming.
Excuse me? Apart from that nonsense (yes, it really says "performance of the programming") who are you to tell me which parts of my site you should be allowed to access? Besides, what content are you looking for, anyway?
For which purpose does the webspider exist?
The webspider is scanning the contents of the pages for .de domains similar to what other search engines are doing. This will be used to build an index of .de domains.
And what, exactly, would be the purpose of a list of .de domains? I can think of a lot of reasons and I don't like any of them. There's a reason why such a list does not exist. And I'm certainly not helping you build one.
And into the .htaccess it went. Please do the same. Thanks.
Comments (0)
Damn Spam!
http://spam.tinyweb.net/article.php/cupribot