Method 1: Moderate Comments
For every comment you receive, you read through it before it gets posted to your site or blog. Most blogging systems have a moderation page where you have to manually approve each comment, but people end up disabling this or allowing only users to post comments. Bots can create fake user accounts and spam from those and just disabling moderation will lead to even more spam. When a spam bot notices that a site they spammed previously showed their comment, they are going to add that site to a list to spam in the future because it works.
The only impact the users will notice is that their comments don’t appear right away. This could be somewhat frustrating to the user, but most of them are aware of moderation by now since nearly every site has it going on. To make your users impact even less, you could set specific users as being “Approved” and make all of their comments skip moderation by default. I suggest making users who have 5 - 10 good comments and not a single bad one be approved right away. Also make sure the user accounts are not brand new, set a timeout of a month before users can be free and post without having to be moderated.
The success rate for this method is 100%, unless you clicked “Approved” by mistake. Your moderators and administrators will read through all the comments manually and will know if it’s a genuine comment or spam.
Method 2: Hidden Field - Honey-pot
This method which only sometimes works, is setting an input field with
the type of “text” and styling it to
display:none;. Most bots will
fill in this field as to not take the chance of it being optional or
required and empty, making the bot spam the comment again. Because they
fill this in, and users won’t see it, you can almost certainly create a
case so when the field is not empty, it’s a spam bot.
Spam Bots are getting better at detecting this method though, so it’s not 100% spam proof, but I did notice it cutting my spam by more than half on a few of my sites. You can make it more difficult for the bots to detect that it’s a trap by having the style in an external style-sheet as opposed to having it inline.
This detection is great for the real visitors, since it doesn’t affect them at all. They won’t see any extra inputs, they don’t need any extra browser requirements (other than CSS, which all browsers support unless text only), and your visitors will see less spam on your site.
Method 4: Cookie Timeout
Spam Bots don’t care about your content, just how they can populate their links all over your site so they can make an extra dollar or two. Because of this, they don’t spend much time on your site, only enough to parse the comment form and submit the spam. We combat this by setting a cookie at the time the page loads having the value of the time (epoch or Unix time-stamp). When the comment is being processed by your code, make sure the cookie is set and is at least 3 seconds past the value you set for it. Also make sure you set a timeout period, so if the cookie is over 2 hours it’s also marked as spam.
Using this method alone has resulted in huge success. I don’t suggest just using one method, but if you had to pick an automated method to combat spam, this is the choice I would make for now. It’s fairly simple to implement into existing systems and works really well.
Method 5: External Cookie
It has all the same benefits and downsides as the other cookie method, such as no user interaction required, visitors don’t notice it, but it does require a browser that saves cookies.
Method 6: IFrame Comment Form
If you have your comment form located inside an
chances of a spam bot finding the form is less likely. This method
won’t work for WordPress sites and other sites running popular Content
Management Systems, since the spam bots don’t need to see your comment
form because it’s well known.
This method is less desirable since incorporating
<iframe>'s into a
site is usually bad practice, so only use this as a last resort pretty
much. I have never had to use this method, thankfully, due to the other
methods available out there.
Users are hardly impacted by this change on your site, but browsers can
<iframe>'s disabled. Modifying your code to reflect to the
<iframe> comments may be difficult, depending on how you
currently have your comment system setup.
Method 7: External Comment System
Another downside is that Search Engines won’t pick up on your comments when they scrape your site. This may or may not affect your rankings that much, but most people prefer comments appearing to search engines to help make the page active.
Method 8: Registered Users Only
Having your visitors only post if they have an account is an old method, and a pretty good one, if you don’t mind taking a huge hit to the discussion on your site. Spammers don’t want to spend the time of creating an account, checking for the activation email, logging in and then finally spamming. Most users don’t even want to take the time to do those actions just to post a comment too.
The success rate for this method is around 60% for known CMS’s and about 90% for unknown website scripts. The success rate is lower than you probably want for taking such a huge hit to your site’s discussion and how much is required from a user to post a comment. I don’t recommend this method at all, but it may be something you have to enable.
Method 9: Captcha
The well known method of Captcha should be mentioned on this post, but is another method I like to avoid at all costs. Having users and visitors take extra steps to discuss on your site is what you want to avoid.
Captchas are images that display random text in a funky and groovy font. In-front and behind the text are random shapes, lines, colors and more. Most OCR’s are unable to process the image so the spammers have to send the image to a Captcha Solving service where thousands of people solve Captchas for their daily jobs.
You will remove a lot of spam from your site by adding Captchas, but as computers get more powerful and better OCR techniques are created, captchas will soon be too difficult for humans to read and easy for bots. At it’s current state, I have trouble reading Captchas and about a third of the time I enter them incorrectly.
Method 10: Comment Rules
Creating a set of rules for comment text and marking those as spam is very worthwhile. I have all comments that include multiple links, random text, mix between language characters and more marked as spam. A lot of spam on my site contain both Chinese and English characters and multiple links. This was easy enough to make a rule for, and cut back on spam. The likelihood of a real comment containing that is not very high, but I still keep the comments to moderate in the future, but flagged as spam so I don’t have to deal with them daily.
Users won’t have to enter any extra data into your site, but their comment may get placed into a spam folder for future review.
Method 11: IP Banning and IP Databases
This method is suggested by some, but probably won’t help all that much unless a single IP has some sort of vendetta against your site. If there is a comment on your site that is spam, you ban that IP Address from future comments. Since getting access to proxies and spoofing IP Addresses is simple, banning thousands of IP addresses will be required. Then if a user of yours is using that same IP Address, they won’t be able to comment.
There are also services and sites that list all known spam IP Addresses that you can search for a specific address and decide if you should allow it or not. These services are better than just building your own because they are used by many people. People will report IP Address as spammers, and then if your site is visited by one of them, you know they spammed x sites in the last x hours or so.
Now, get to work on fixing up your comment section so you can work on blogging instead of hitting delete on thousands of comments.