Drupal, Mollom, and the Future of Blog Spam

11-Jun-2008

Is it just me, or has anyone else been struck by the lack of attention being paid to blog comment spam?

No one needs a reminder of how severe the spam problem is with e-mail. But e-mail spam is just one piece of the spam pie. (Oh man, talk about a hard-to-swallow metaphor...) Somewhere between 80 and 90 percent of comments posted to blogs and/or wikis come from spambots or their human surrogates. Bear in mind, as technologies go, blogging is fairly new by comparison to e-mail. We're still near the beginning of the blog-spam curve.

To the extent that Social Software and Web CMS vendors sell, bundle, or pre-integrate blog and wiki solutions for you to employ beyond the firewall, they're selling you spam magnets as part of the deal. But they're not necessarily helping you with spam filtration.

You'd expect Social Software purveyors to be pioneers in this area, and some of them have decent services. But surprisingly, many of the vendors covered in our just-published Enterprise Social Software Report 2008: Networking & Collaboration Within and Beyond the Enterprise scored rather poorly on anti-spam capabilities.

Typical remedies for blog spam include comment moderation, challenge-response techniques, and automated filtering based on some combination of reputation assessment and AI-based text analysis. There are problems with all three approaches.

Moderation is tantamount to hand-processing. This is impractical in many cases and will only become more so over time.

A more practical deterrent is the CAPTCHA (a common challenge-response technique). The idea is that if you can correctly identify the letters in a deformed Gif image of a word, you're human, not a spambot, and therefore can be trusted not to post garbage. The CAPTCHA deters robots remarkably well (so far, at least), but it also deters legitimate posters to some extent. (Not everyone wants to play a word game in order to leave a comment.) It will not deter a malicious human. Offshore boilerrooms of paid CAPTCHA-breakers can (and do) still break through.

Filtering based on AI-driven text analysis can be effective for blog comments as well as e-mail. The problem with text analysis is that unless misclassification errors can be kept to just a couple of percent, you're still letting a lot of junk through. Consider a blog that receives 100 comments. Typically, 80 will be spam. An AI-based spam filter that's 90 percent accurate will let 8 bogus comments through. Since you had just 20 legitimate comments to begin with, you're left with a situation where over a quarter of your published comments (8 of 28) are bogus.

Comment spam mitigation technology is obviously a work in progress. Some interesting new work in this area is being pursued by none other than Dries Buytaert (creator of Drupal). Buytaert, along with university classmate Benjamin Schrauwen, recently introduced Mollom, a comment-filtering SaaS offering (free for non-commercial users). Buytaert and Schrauwen hold doctorates in computer science. Schrauwen's is in machine learning.

Mollom relies mostly on proprietary text analysis techniques, but takes a multi-tiered approach. When a comment arrives for analysis, it is given a score of ham (good), spam (bad), or uncertain. When the content's quality is uncertain, Mollom issues a CAPTCHA challenge to the submitter. If the submitter passes the CAPTCHA test, the content is marked as good. Buytaert and Schrauwen claim that Mollom (currently used by 1459 websites) is 99.78 percent effective.

What makes Mollom better than, say, Akismet? It's hard to know, at this point. Mollom's algorithms are a closely guarded secret (but are likely to be the original work of Schrauwen). Akismet says only that it runs "hundreds of tests" on every incoming comment (which sounds more than a bit Rube Goldberg-ish).

Mollom's most important differentiator may ultimately be its ability to act as an OpenID reputation service. For every incoming request associated with an OpenID value, Mollom updates the reputation of that ID based on the scoring of the associated comment(s). Over time, the trustworthiness of any user who has an OpenID becomes a simple table lookup rather than an elaborate exercise in artifical intelligence.

If you're in the process of selecting a Web CMS and/or Social Software vendor, and you plan to deploy public-facing blogs or wikis, be sure to take comment spam mitigation into account. Moderation of comments (by humans) is inherently costly. A SaaS service like Mollom or Akismet may not completely eliminate the need for moderation but could be money well-spent. One thing is certain: spam is something you need to budget for and architect around. Ask your vendors what kind of help you can expect from them. And don't settle for the sound of crickets chirping.

Real Story Group

Strong opinions. Candid advice.

Drupal, Mollom, and the Future of Blog Spam

Other Web Content & Experience Management posts

Future-Proofing MarTech: Join other Leaders in DC

WCM vendor research updates

Whither Sitecore Now?

[Webinar] Web Content Management in a Post-DXP World

TeamSite Marriage Counseling