Damn Spam!
Search 

CupriBOT

   

On one of my sites, I stumbled across yet another bot that ignored the robots.txt: CupriBOT. The twist is that they openly admit that it's ignoring the robots.txt. Their FAQ is in German, so here's a rough translation:

Why does the robot ignore my robots.txt?
The webspider would like to see the pages as any user sees them when they call up the URL. The robots.txt file is not mandatory and not relevant for the content it's looking for. The effort for interpreting the robots.txt would be high and would have an impact on the performance of the programming.

Excuse me? Apart from that nonsense (yes, it really says "performance of the programming") who are you to tell me which parts of my site you should be allowed to access? Besides, what content are you looking for, anyway?

For which purpose does the webspider exist?
The webspider is scanning the contents of the pages for .de domains similar to what other search engines are doing. This will be used to build an index of .de domains.

And what, exactly, would be the purpose of a list of .de domains? I can think of a lot of reasons and I don't like any of them. There's a reason why such a list does not exist. And I'm certainly not helping you build one.

And into the .htaccess it went. Please do the same. Thanks.

Story Options

Trackback

Trackback URL for this entry: http://spam.tinyweb.net/trackback.php/cupribot

No trackback comments for this entry.
CupriBOT | 2 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
CupriBOT

Please include full agent string so we can add these to our software

Authored by: Anonymous on Wednesday, July 05 2006 @ 21:57 CEST
CupriBOT

The UA string was

"cupriBOT [http://www.cfmx.de/webspider]"
Authored by: Dirk on Wednesday, July 05 2006 @ 22:07 CEST

Copyright © 2012 Damn Spam! Powered by Geeklog