Over the past month, we’ve seen an increased amount of what people are calling “Ghost Spam.” Ghost Spam is essentially traffic that may look real at first blush, but upon further investigation is bot traffic.
There are two types floating around Google Analytics profiles everywhere. The first of these types is your standard referral spam. You’ll probably recognize the highly publicized (and invasive to your Analytics) semalt.com.
The other of these is a little more mischievous. The second kind of Ghost Spam uses a separate hostname like darodar.com and your GA number to spam your system and your data. We’ve developed this post to help you rid yourself of these two types of pesky spam.
Method 1 - Exclude Campaign Source
In Google Analytics, you can set a filter to look for all of the campaign sources which may deliver spam traffic to your site. We’ve compiled a total list in regex, though your filter can only have 250 characters. So, you’ll need several filters to get everything.
There are other ways to write this that are shorter, but this should be easy for most people non-regex trained to copy and paste:
Feel free to add others you find to your own list. Your filter will look something like this when you enter everything:
Method 2 - Include Only Hostname
This next one involves setting your profile to only report on traffic to your own hostname. (Replace client.com with your own site domain). Make sure you only include hostnames which you control and want to report traffic from.
Why Exclude for Referral and Include for Hostname?
The answer to this is due to the nature of what you know out of both data sets.
For hostnames, you always know exactly which domains you want to track, so you use an include filter. For referrals, you only know spam by finding the domains in your reporting. You don’t want to accidentally exclude real information from affiliates or partner sites.
1. There’s no need to be afraid of ghosts.
2. Two filters will help make sure you capture as much Ghost Spam as possible.
3. Exclusion filters can get a little messy, so you may need a few to capture all the sources you find