As you will have gathered from the heading, I am not writing about wine today. Instead I deal with an unpleasant topic that most bloggers struggle with: comment or blog spam and how to fight it. Like most blogs, the Wine Rambler is targeted by a spammers who aim to get as many links to their websites distributed across the Internet so that they can make more money from selling rubbish products. Comment forms on blogs are an easy target as they were designed to make it easy for people to leave comments with links attached to them. To make their dirty work easier, spammers use more or less sophisticated software, the so called spambots (spam robots), to trawl the Internet for any comment or mail form they can find and then bombard it with spam. We get dozens, sometimes hundreds of these visitors per day and eventually decided it was time to do something about it.
For the first few months of the Wine Rambler's existence spam was just a mild nuisance, requiring us to occasionally delete a comment. Eventually it got more and more, so we introduced comment moderation. Most blogging software and content management systems support this function where comments on postings are only shown after a moderator has approved them. While this effectively keeps comment spam out of your blog, it does not stop spammers sending it to you in the first place. In addition to that it can be annoying when visitors who want to comment have to wait for hours or even a day before they see their comment. Obviously, if you are one of the online-24/7-people it is not really an issue, but you will still get many emails from the comment moderation queue, most of which just alert you to spam.
Disable comments, at least on older posts
One way of dealing with comment spam is to disable comments. Obviously, this defeats the purposes of a blog to a certain extent, but it can be useful under two circumstances: a) you are under heavy attack from comment spammers who have found a way of by-passing your security or b) as a selectively used means of stopping comment spam. For instance, you could disable comments on older pages. On the Wine Rambler, comment spammers targeted one particular page almost exclusively - perhaps a human spammer had found it and made that the default go-to page for their spam bots? - and as this was just an announcement that was outdated a few days after posting it we disabled comments on that one page. From that second on, at least 50 comment spam attacks on that page per day (!) were dealt with. Disabling comments does sound like being defeated, but in this particular instance it was a marvellous success. Obviously, at some point the spammers might realise what we did and target another page.
As this is not much fun, we decided to implement a technology called CAPTACHA. CAPTCHA is one of those lovely acronyms we geeks like to use and it stands for: 'Completely Automated Public Turing test to tell Computers and Humans Apart'. In short, it is a test attached to a comment form that, in theory, only allows a human to fill it in. It can be very effective because most spammers use spambots for distributing blog spam. CAPTCHAS are actually a form of reverse Turing test. Turing, a 20th century mathematician, was looking for a way to test machines for intelligence through tests applied by humans. If you have seen the movie 'Blade Runner' you will be familiar with such a test as Harrison Ford administers something like it to Sean Young. The CAPTCHA is a test applied by a machine, your webserver, to another machine, the spammer, or a human visitor.
Initially, we used a math CAPTCHA in the form of a very basic formula, such as '4+11=X'. As many spambots are really stupid creatures, they cannot do the mathematics, so the the blog does not allow them to post the comment. Another form is a word CAPTCHA where the visitor is being asked to select, for instance, the third word from the group 'Riesling, Silvaner, Chardonnay, Cabernet'. CAPTCHA plugins exist for pretty much all blogging systems and are usually easy to set up.
Unfortunately though, more and more spambots are now clever enough, if you want to call it like that, to understand these CAPTCHAS, so it was not long until we needed another solution.
Block anonymous links
One of my favourite solutions to fight spam is to disallow visitors to use links in their comments. Spammers usually want to get you to their website, so if you block all comments with links in them almost all spam automatically gets filtered out. Sadly, many human visitors also want to point you to interesting resources, so this was not an option for us. It can work for you though if most of the people commenting on your site are registered users, in which case you allow them to post links and disallow it for anonymous visitors. It is still not ideal though, so we kept looking for another solution.
Which brings us to a special form of CAPTCHA that is widely used today: the image CAPTCHA. Usually the comment form displays an image with about five different letters in different colours and font sizes and with lots of distortion in the background. The idea is that many spambots are not sophisticated enough to perform OCR, optical character recognition, so they cannot tell which letters are shown on the image.
While this works very well for us, it has a few disadvantages. First of all visually impaired and blind people cannot see image CAPTCHAS, so they are effectively blocked from leaving comments. Also, the better spambots become at OCR the harder to read your image CAPTCHA has to be – which can cause problems for less eagle-eyed humans too. Most of us will have come across a CAPTCHA that they could not read, no matter how hard they struggled. Some CAPTCHAS get around this by also offering a soundbite that can be read aloud to human visitors. ReCAPTCHA is the most prominent one these days. Unfortunately though it seemed to run somewhat slow on our site and, more importantly, some of the soundbites are really hard to understand if you are not a native English speaker.
Image CAPTCHAS are less than ideal for another reason and this is performance. If your website gets hit by a lot of spammers it may slow down noticeably, depending on the server resources at your disposal. This can get worse if the web server has to generate a new CAPTCHA image for every try the spambots make to leave a comment. If you need lots of distortion in the images to make them hard to decipher for spambots this will increase CPU load – an issue if you use shared web hosting. Despite these issues, the CAPTCHA is working very well for us and has reduced the time we deal with spam to almost zero. Over the past few months, it has blocked around 2500 spam form submissions – imagine you have to deal with those manually...
Content analysis - Akismet
What other ways are there to fight comment spam? One solution is a service called Akismet. Akismet comes in the form of a plug-in, for instance for the WordPress bloging software. It collects comments from many thousands of blogs, analyses them and blocks those it identifies as spam. This can be very useful as spammers often leave the same messages or at least the same links over lots of websites at the same time, so a broad perspective makes it very easy to identify them. If by mistake a legitimate comment from a human is identified as spam ('false positive'), you can correct that error manually.
This, by the way, is the reason we do not use Akismet – in the end you still have to check all spam comments just in case the service got it wrong. It seems to be pretty reliable but as we would not want to lose a single legitimate comment it just is not an option. Judging from a few older posts at least, it also seems that Akismet can have performance implications on your site if hit by lots of spam.
There is another way to fight spam though, and it also will make your website respond faster and not slower. Unfortunately though, it is more for the techies amongst us. Even so, you might find it at least interesting to read about it in the second instalment of this series on comment spam.