Fighting Spam in WordPress

This month’s WPToronto East meetup focused on security & privacy. Don Tai shared an in-depth presentation on combatting spam in WordPress, which spurred on some lively conversation about all sorts of related topics.

Years ago, Don was contacted by his hosting provider. He was told that his site was using too many resources, despite it being a simple personal blog. They even threatened to shut his site down if the problem wasn’t fixed.

That’s where his crusade against comment spam began.

You can download the slides or find the takeaways below.

Why do spammers target your site?

It depends. You can’t tell what their intent is. But we do know that it affects all kinds of websites. Personal blogs, online stores, corporate websites… nobody is exempt from being hit by spammers.

What types of spam are there?

We looked at three types of spam in the presentation: Comment spam, referrer spam, and ghost spam.

Comment spam is the most obvious. Don called out “SEO marketing organizations” as particularly bad offenders. They leave comments with nonsense text and links pointing back to whatever site they’re targeting.

If you’re using a plugin like Akismet, these comments will be flagged and filtered into the Spam folder on your Comments screen in WordPress.

Referrer spam is a sneakier form of SEO spamming. The spammers hit your site with a fake URL defined as the referrer. If the search engines crawl your site’s access logs, they’ll also crawl the fake URLs.

The goal with referrer spam is to (somehow) increase a site’s search ranking by having backlinks from the targeted site’s access logs.

Ghost spam is even sneakier. With ghost spam, the spammers use your Google Analytics ID (often randomly generated) and hit the GA servers. Google correlates the spammer’s provided URL with your Google Analytics ID, so the URL shows up in your Google Analytics reports, even though the spammer never actually visited your site. (Hence it being a ghost!)

The goal is to pique your curiosity and have you click through to these suspicious URLs. And while that’s bad enough, the fact that these “visits” artificially inflate your traffic numbers adds a bit of insult to the injury.

Why should we care about spam?

Your site could be penalized by Google or other search engines for linking to bad websites.

Spam can artificially inflate your analytics reports, affecting important decisions that rely on the analytics data.

Spammers can put unnecessary strain on your hosting, which affects your site’s performance (e.g. load times) and costs you money (e.g. paying for a more resource-intensive hosting plan).

In his presentation, Don mentioned that approximately 50% of all web traffic comes from bots. Of that, approximately 30% is malicious. What would the performance improvements be if you could identify and stop that 30% of malicious traffic?

“Spam is like the gateway drug for moving up the chain, testing security of your plugins, and breaking into your site.”

In the beginning, spam might seem like a trivial issue. But as spammers poke and prod your site, they may also be looking for other security holes to exploit.

That leads to problems like malware infections or hijacked search results. These are significantly more harmful than spam alone. But spam is where it starts.

WordPress is wildly popular for building websites. Which means it’s also a wildly popular target for spammers and hackers.

A couple of years ago, WordPress websites were targeted by an exploit in XMLRPC. Prior to that, timthumb.php was a popular target.

The methods used for detecting these security holes — crawling and scanning websites — are the same methods that spammers use. So what we do to combat spammers can also be helpful in combatting more aggressive attacks.

Don’t forget about raw access logs.

Raw access logs provide a complete transcript of all traffic that hits your website. It’s far more comprehensive than anything you’ll find from the likes of Google Analytics.

If you’re on a shared hosting provider, you can view your access logs via Cpanel. (If you don’t know where to look for access logs for your site, check with your hosting provider.)

Don suggests downloading your raw access log and then opening it up in a spreadsheet tool like Microsoft Excel or Google Sheets. You’ll see columns that correspond to:

IP or hostname accessing your site
Time and date they connected
The resource that they tried to upload or download
The success or failure of their connection attempt
How much data they used
What they downloaded
Their referrer ID (spoofable)
Their user agent, e.g. browser used (also spoofable)

By correlating this data against other information, like what you can find in your comment spam folder in WordPress, you’ll get a better understanding of what spammers are doing on your site.

For example, they may be crawling your site from one IP address, and then posting a spam comment from another IP address.

How can you stop spammers?

The first step is to install a security plugin. iThemes Security and Wordfence are two recommended options. Jetpack also includes security features (“Jetpack Protect”).

Note: Read the documentation for security plugins! You can accidentally lock yourself out of your own site. (E.g. ending up on a blacklist and requiring a direct DB change to fix.)

Next, make sure you’ve installed and activated Akismet. This’ll help filter comment spam from your site.

Consider disabling XML-RPC altogether. You’ll lose some functionality from apps that rely on XML-RPC, but as support for the REST API grows, that’s becoming less of a concern.

Lastly, GM Block Bots is a WordPress plugin that filters known ghost/referrer spam from your Google Analytics reports.

From there, you can start escalating your security with more aggressive tactics.

One method is to ban IP addresses that are known to be malicious. Both iThemes Security and Wordfence have lists of “bad neighbourhoods” that they rely on when determining which IPs to block.

Unfortunately IP banning may affect innocent users. If you’re getting hit with a lot of spam traffic from Russia, for example, you may be tempted to block all Russian IP addresses. If you have no interest in people from Russia being able to visit your site, this shouldn’t be a problem. But what if you’re blogging in Russian?

Also, since IPv6 doesn’t map to geography, the idea of “blocking out” specific regions doesn’t work.

Another popular method is to monitor and filter incoming traffic. Cloudflare is a popular 3rd party service that does this. If Cloudflare detects suspicious behaviour, they may ask the site visitor to take specific steps to get to the site.

Unfortunately these types of security checks aren’t flawless, and may lead to innocent users getting blocked from your site.

Stay vigilant. The war against spam will never end.

There’s no way to stop spammers entirely. They will continue to devise new ways of hitting our sites. The best we can do is take precautions, stay alert, and use new solutions as they become available.

Thanks again to Don Tai for presenting this month! Have any tips of your own for combatting spam? Leave your recommendations in the comments below.

Image credit: Mike Mozart via Flickr