PHPrbl
The background
Ever had those annoying referrer spammers ruin your website's statistics?
If you run a big website, or a weblog, you must be getting fed up with this
blasted referrer spam.
You're not alone, I have this problem as well and got fed up with the whole
thing. After weeks of playing with the idea of blocking referrer spammers
using rewrite rules, I got fed up wit the high amount of energy it would
demand of me just to block IP addresses that seemed to be open proxies
anyway.
Referrer spam
Referrer spam is, simply explained, a client leaving a fake referrer on
your website. The statistics program that generates the referrer reports
treats this referrer like it would treat any other referrer and list it
in your website's referrer list. These referrer spammers don't just visit
your site once, they visit it several times an hour, resulting in something
like this.
These referrers have nothing to do with my site whatsoever.
RBL
Upon digging a bit deeper into the muck that was created by this referrer
spam, I discovered that most of the originating IP addresses were open
proxies. A lot of these open proxies are listed by RBL lists that are
primarily used by mailservers to block spammers.
This is where PHPrbl comes in, it checks the IP address of the visiting
client against one or more RBL lists and will block the client if it is
indeed listed.
The results
Here's the realtime generated Top 10 of blocked IP addresses:
| # |
IP |
hits |
|
service(*) |
lastseen |
| 1 |
12.178.240.115 |
1 |
|
K |
Aug 21 2006 - 17:33:51 |
| 2 |
164.124.116.43 |
1 |
|
K |
Aug 21 2006 - 17:53:57 |
| 3 |
201.17.166.32 |
1 |
|
K |
Aug 21 2006 - 17:54:42 |
| 4 |
24.221.191.219 |
1 |
|
K |
Aug 21 2006 - 17:54:46 |
| 5 |
82.210.131.55 |
1 |
|
K |
Aug 21 2006 - 17:57:24 |
| 6 |
212.27.33.240 |
1 |
|
K |
Aug 21 2006 - 18:07:19 |
| 7 |
203.133.27.133 |
1 |
|
K |
Aug 21 2006 - 18:03:33 |
| 8 |
66.208.198.22 |
1 |
|
K |
Aug 21 2006 - 18:05:45 |
| 9 |
201.17.212.169 |
1 |
|
K |
Aug 21 2006 - 18:08:41 |
| 10 |
72.35.75.158 |
1 |
|
K |
Aug 21 2006 - 18:12:02 |
| (*) 1=rbl.init1.nl; M=localmysql; K=keyword; A=AHBL; S=Spamhaus |
And here is the top 10 of blocked keywords:
| # |
keyword |
hits |
added on |
| 1 |
567483 |
4285 |
Mar 01 2006 - 10:17:17 |
| 2 |
gall |
2642 |
Jan 20 2006 - 09:41:43 |
| 3 |
asstraffic |
1389 |
Dec 22 2006 - 11:37:16 |
| 4 |
allinternal |
1321 |
Dec 22 2006 - 11:37:38 |
| 5 |
gay |
1090 |
Feb 07 2006 - 15:52:00 |
| 6 |
texas |
1088 |
Feb 23 2006 - 13:59:52 |
| 7 |
holdem |
1087 |
Jan 20 2006 - 23:14:19 |
| 8 |
poker |
1041 |
Jan 20 2006 - 10:46:54 |
| 9 |
fuck |
441 |
Jan 19 2006 - 16:28:56 |
| 10 |
blowjob |
157 |
Jan 19 2006 - 23:30:32 |
These statistics have been accumulated on this page, eol.init1.nl and
on rudeboy.init1.nl. Counters last reset on Juli 10 2006.
I want it!
Okay, you can download version 0.4 of PHPrbl here.
I want to contact you
No problem, you can use this form on my website. Please feel free to
drop a line if you use PHPrbl. Comments and suggestions are welcome too!
Zee grand TODO list
- Add whitelisting functionality.
- Add more administrative possibilities.
- Find a bigger list of RBL services that list open proxies. SORBS was nice,
but it also lists dynamic IP ranges. We'd be blocking too big an audience
if we used it.
The logging to MySQL can definately be rewritten to be more eficcient and
more modular, so you can add your own RBL service without needing to rewrite
stuff. Done in version 0.2
If logging to MySQL is enabled, use the timestamp as a 'lastseen' option
allowing us to block the IP address even before we do the DNS lookup. This
could speed things up, especially on servers that have DNS lookup problems. Done in version 0.3
(create own RBL list, to block some IP addresses that are not listed as open
proxies, when bored on a rainy afternoon) Had plenty of afternoons to do this now.
Are there drawbacks
Yes, because PHPrbl needs to do a DNS lookup on a hostname, the loading of the page
might be a bit slower, depending on the server the site is running on. On my own
sites, I have not noticed any slowdown however.
The prechecking option should improve performance a bit.
Related links
- AHBL - The AHBL is a database of hosts that have been known to cause various
forms of abuse on the Internet which includes UCE/UBE/spam, Denial Of Service attacks,
cracking attempts, and much more.
- Spamhaus - Spamhaus tracks the Internet's Spammers, Spam Gangs and Spam
Services, provides dependable realtime anti-spam protection for Internet networks,
and works with Law Enforcement to identify and pursue spammers worldwide.
- mod_access_rbl2 - An Apache 1.3.x module which does the same thing as PHPrbl
but on Apache level. This way, all files are protected, not just your PHP files.
- Spamhuntress - Weblog of the spamhuntress, a large section of the blog
is dedicated to referrer spam. Probably a good source of blockable IP
addresses for my new blocklist.
Helping out
While I work on a way for people to submit IP addresses for my own RBL, you can help
out by donating some money through PayPal, or, if you are a World of Warcraft player on
the Shadow Council server, send a few gold coins to my warrior, Puntee :)
Changelog
Version 0.4 - October 26 2005
After too long of an away time (my humble apologies), I've finally had some time
to do some more coding.
- Streamlined the code based on input given by Steven Lynn. Fewer queries are
now used to do the same thing. Thanks!
- New feature: keywords checking; referrers are now matched to keywords given
by the site admin. If a match occurs, the client will be blocked. (For myself this
feature has already proven to be very, very effective)
- First start of an admin area for PHPrbl, for now, only the ability to add
and removed keywords is present. Whitelisting and local IP blocklist management
will be added soon.
Version 0.3.1 - May 17th 2005
- Fixed bug: If prechecking was enabled, it would only check against the local
database and skip the DNS lookups if no positives were detected. This allowed IP
addresses not in the database to access the site and still leave false referrers.
Version 0.3 - May 16th 2005
- Prechecking using the data in MySQL, no more DNS lookups when it's not needed
- Fix of bug, discovered by Steven Lynn, that could result in false positives.
Version 0.2 - May 10th 2005
- Logging of the last referrer given by an IP
- Rewrite of logging to MySQL which implies:
- Previous gathered data is useless
- IP addresses are now unique in database
- Hits of IP addresses in the same row
- Logging of the given referrer for review
- Added exit code telling the site is protected by PHPrbl
Version 0.1 - May 5th 2005
Mumbo Jumbo
PHPrbl - © Eelco Wesemann, 2005
This is free software, released under the GNU/GPL License.
|
|