Last week a Hungarian guy asked me if I could develop something effective solution against bad robots so some days ago I started to execute a fast survey on topic and found many solution, but most of them based on certain host placed into .htaccess, and none of them was automatic, so the challenge was given.

Some days ago I started a little survey on topic and found a lot of htaccess rules, where certain hosts were rejected via .htaccess, but they were not automatic, so the challange was given. The most useful site I found was this resource which let me know the basic attitude of bad robots to the robots.txt files. They ignore the specified restrictions.

1. Open your existing robots.txt file or upload one and place the following lines into it

User-agent: *
Disallow: /core

The name of the restricted folder is not important, but would be great if the humanoid atteckers would find it enough attractive as well since this folder will be the live-bait.

2. Create the folder on your hosting space which is specified in the robots.txt file, in my example this is called core and upload an index.php file with the following content:

<?php
$ip = $_SERVER[”REMOTE_ADDR”];
$logfile = ‘bannolnilog.txt’;
//collect the IP adresses or something else into the logfile
$fp = fopen($logfile, ‘a’);
fputs($fp, “$ip
“);
fputs($fp, ” “);
fclose($fp);
echo “your IP was logged for security reasons and your visit is now over”;
?>

3. As you may see in the code I defined a $logfile where the IP adresses will be collected and stored hence we need to upload to the same (core) folder a blank txt file called bannolnilog.txt (chmod 644).

4. We need to upload one more php file which will check if the visitor is bannished whenever a page is requested, I named this file validator.php and its content is the following.

<?php
$ip = $_SERVER[”REMOTE_ADDR”];
$logfile = ‘bannolnilog.txt’;
$target = file(dirname(__FILE__). “/core/bannolnilog.txt”);
foreach($target as $item){
$item = trim($item);
if(stristr($ip, $item)){
header(”HTTP/1.0 403 Forbidden”);
exit;
}
}
?>

5. As final step you need to insert this line into the very front of your script header or index file, the point is that this is how the script must started whenever a page is requested

<?php require “/you/need/to/insert/the/path/here/validator.php”;?>

Note: You may truncate the logfile deleting the collected IPs, and please take into consideration that WordPress is make quotation marks display a bit odd, so you may want to double check the syntax of the code.
I warrant nothing, but works very well at one of my sites.

Have a nice further day!