How to remove fake traffic from your site

How to remove fake web traffic to your site

 

How to Delete Fake Traffic from your Website?

 

We often notice increased traffic to the website when analyzing data gathered by Google Analytics. This is often seen as increased interest to our source. Naturally, such growth is bound to please.

But it’s not always the cause for celebration. Later we may notice that most of this referral traffic was sent from spammers. Spam has recently become quite a significant problem.

Referral spam happens when your website receives fake traffic leads from spam bots. This fake traffic is recorded in Google Analytics. If you notice traffic from spam sources in your Analytics results, you need to take particular steps to delete such data from your statistics.

 

 

What is a Bot?

 

Software programs which primary task is to execute repetitive tasks with maximum speed and preciseness level are commonly referred to as bots.

Traditionally bots are used for web indexing of contents of internet sources, and they’re regularly used in this capacity by search engines. However, bots can also be used for malicious purposes, such as:

-         fraud by clicks;

-         accumulation of e-mail addresses;

-         transfer of websites’ contents;

-         distribution of malware;

-         artificial overstatement of site’s traffic.

While analyzing tasks that bots are used for, one can divide them in safe and malicious.

 

 

Safe and Malicious Bots

 

Googlebot is one example of a “good” bot, which is used by Google for scanning and indexing webpages on the internet.

Most bots (whether safe or malicious) do not run a JavaScript scenario, while some of them do so.

Search bots that run JavaScript scenarios (such as the Google Analytics code) are shown in Google Analytics reports and thus misrepresent traffic data (direct traffic, referral traffic) and other metrics based on sessions (bounce rate, conversion ratio, etc.).

Search bots that do not run JavaScript (such as Googlebot) do not alter the abovementioned data. But their visits are still recorded in server logs. They also consume server resources, lower the bandwidth and can negatively affect the site loading speed.

Safe bots, unlike malicious ones, comply with the robots.txt directive. They can create fake user accounts, distribute spam, collect e-mail addresses, and override CAPTCHA.

Malicious bots use various methods to make their detection more difficult. They can affect the web browser (such as Chrome, Internet Explorer, etc.), and incoming traffic from a regular website.

It is impossible to say for sure, which malicious bots can alter the Google Analytics data, and which can’t. Therefore all malicious bots shall be viewed at as a danger to data integrity.

 

 

Spam Bots

 

As is clear from their name, the main purpose of such bots is spam.

They visit numerous websites on a daily basis, sending HTTP requests to websites with face referer header. This allows them to avoid being detected as bots.

Fake referer header contains the address of the website the spammer wishes to promote or to which he wants to receive backlinks.

When your website receives a HTTP request from a spam bot with fake referer header, it is immediately recorded in the server log. If your server log has open access, it can be scanned and indexed by Google. System processes referer value in the server log as a backlink, which affects ranking of the website the spammer promotes.

Lately Google indexing algorithms have been created in such a manner to not account for log data. This downplays the efforts of creators of such bots.

Spam bots that can run JavaScript scenarios can overrun filtering methods used by Google Analytics. Due to such ability, this traffic is reflected in Google Analytics reports.

 

 

Botnet

 

When a spam bot uses botnet (network of infected computers located locally or all around the world), it may get access to the website from hundreds of different IP-addresses. In this case, IP blacklist or rate limiting (rate of traffic sent and received) become mostly useless.

The Spam bot’s ability to misrepresent traffic to your website is positively related to the size of botnet the spam bot uses.

If the botnet is large and features different IP-addresses, the spam bot can get access to your website without being blocked by a firewall or other traditional security tools.

Not all spam bots send out referer headers.

In the latter case, traffic from such bots will not appear as a source of referral traffic in Google Analytics reports. It will look like direct traffic, which makes it ever so harder to detect.

In other words, every time the referer is not transferred, Google Analytics processes such traffic as direct.

Spam bot can create dozens of fake referer headers.

If you block one referer source, spam bots will send another fake to the website. Therefore Google Analytics or .htaccess spam filters do not guarantee full protection of your website from spam bots.

Now you’re aware that not all spam bots are dangerous. But some of them are extremely dangerous

 

 

Highly Malicious Spam Bots

 

The purpose of highly malicious spam bots is not only to alter the traffic of your web source, clear its content or get e-mail addresses. They aim at infecting another computer with malware in order to make your PC a part of the botnet.

As soon as your computer is integrated in the botnet network, it is used to send out spam, viruses and other malicious software to other computers online.

There are hundreds and thousands computers all around the world that are used by actual people while being a part of a botnet at the same time.

It is quite possible that your own computer is a part of a botnet, while you don’t have a clue.

If you decide to block the botnet, you’ll probably also block incoming traffic from actual users.

It is possible that as soon as you visit a suspicious website from your referral traffic report, your PC gets infected with malware.

Therefore you shall avoid visiting suspicious websites from analytics reports unless you installed proper security (antivirus software) on your computer. It’s best to use a separate PC, intended to only visit such websites. As an option, you can ask your system administrator to handle the problem.

 

 

Smart Spam Bots

 

Some spam bots (such as darodar.com) can send artificial traffic even without visiting your website. They do so by reproducing HTTP requests outgoing from the Google Analytics tracking code, while using your website ID. Not only can they send fake traffic to you, but also traffic from fake referers, for example, bbc.co.uk. Since BBC is a legitimate website, when you see this referer in your report, you don’t even think that incoming traffic from such acknowledged website can be fake. In reality, no one from BBC has visited your website.

These smart and dangerous bots don’t need to visit your website or run JavaScript scenarios. As they don’t actually visit your site, such visits are not recorded in the server log.

And since there are no visits, data is sent directly to analytics, so you cannot block them anyhow (by blocking IP, user, referral traffic, etc.).

Smart Spam Bots scan your site looking for web-property identifiers. People that don’t use Google Tag Manager have Google Analytics code remaining on their web pages.

Google Analytics tracking code contains your web source ID. This ID is then stolen by a smart spam bot and can be transferred for other bots to use. Nobody can guarantee that the bot that stole your website ID and the bot that sends you artificial traffic are one and the same.

You can solve this problem by using Google Tag Manager (GTM).

Use GTM to track Google Analytics on your website. If your web source ID has already been “borrowed”, it is probably too late to solve this problem. All you can do now is use a different ID or wait for Google to solve the issue.

Not Every Website is Subject to Spam-Bot Attacks.

Initially spam bots’ purpose is to determine and to use vulnerabilities of your web source.

They attack underprotected websites. Therefore, if you host your webpage with a “cheap” hosting or using custom made CMS, it is likely to be attacked.

Sometimes it is enough to simply change the web hosting of a site that is often attacked by malicious bots. This simple measure can really help.

 

 

Follow the instructions below to detect spam sources:

 

Go to the referral traffic report in your Google Analytics account and sort the report by bounce rate in descending order:

 

Report on referral traffic in Google Analytics

 

Look at referers with 100% or 0% bounce rate, as well as those with 10 or more sessions. It is most likely that they are spammers.

 

If one of your suspicious referrers belongs in the list below, it is a referral spam. You can skip manually checking it:

 

semalt.com
semalt.semalt.com
buttons-for-website.com
7makemoneyonline.com
ilovevitaly.ru
resellerclub.com
vodkoved.ru
cenokos.ru
76brighton.co.uk
sharebutton.net
simple-share-buttons.com
forum20.smailik.org
social-buttons.com
forum.topic39398713.darodar.com

 

You can download an extensive list of spam sources here.

 

If you were unable to confirm the identity of your suspicious referer, take a risk and visit a suspicious website. It might actually be a normal site. Make sure that you have antivirus software installed before visiting such suspicious sources. They can infect your computer at the moment your access their page.

 

After confirming the identity of malicious bots, the next step is to block them from visiting your website again.

 

 

How Can You Protect your Site from Spam Bots?

 

Create a notation on your graph and make a note explaining the reason for unusual traffic burst.

You can then disregard this traffic for analysis purposes.

 

abstract in Google Analytics

 

annotation in Google Analytics

 

 

Block referral spam, using Spambot capabilities.

 

Add the code below to your .htaccess file (or to your web configuration, if using IIS):

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} ^https?://([^.]+.)*semalt.com [NC,OR]
RewriteRule .* – [F]

This code will block all HTTP and HTTPS sendings from semalt.com, including semalt.com subdomains.

 

 

 

Block the IP-address used by the spam bot.

 

Add the code below to your .htaccess file:

RewriteEngine On
Options +FollowSymlinks
Order Deny,Allow
Deny from 234.45.12.33

Note: there is no need to copy the code to your .htaccess – it will not work. This only shows an example of blocking the IP-address in the .htaccess file.

Spam bots can use different IP-addresses. You shall systematically update the list of spam bots’ IP-addresses found on your website.

Only block those IP-addresses that affect your site.

It’s useless to strive to block every known IP-address. This way your .htaccess file will become enormous and hard to manage, which will decrease your web server capability.

Have you noticed, that the number of lines in the IP-addresses blacklist is growing fast? It’s a clear sign of security issues. Contact your web hosting representative or your system administrator. Use Google to find a blacklist to block IP-addresses.

Automate this task by writing a scenario that can find and ban IP-addresses that are undoubtedly  malicious on its own.

 

 

Use the capability to block ranges of IP-addresses used by spam bots.

 

If you’re confident that a spam bot uses a specific range of IP-addresses, you can block the entire range of IP addresses in one action, as shown below:

RewriteEngineOn
Options +FollowSymlinks
Denyfrom 76.149.24.0/24 A
llow from all

Here 76.149.24.0/24 is a CIDR range (CIDR is a method used for provision of IP ranges).

Using CIDR blocking is more productive than blocking separate IPs, as it allows to minimize the server space use.

Note: You can hide a number of IP-addresses in CIDR, or, to the contrary, unlock them using the following tool: http://www.ipaddressguide.com/cidr

 

 

Block banned users that use spam bots.

 

Analyze server log-files weekly, detect and block malicious user agents that use spam bots. After the block, they won’t be able to access your web source. You can do this as follows:

RewriteEngineOn
Options +FollowSymlinks
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule .* – [F,L]

By using the Google search bar you can find a large list of sources that support records of known banned user agents. Use this information to detect such user agents on your site.

The simplest way is to write a scenario that allows to automate the entire process. Create a database with all known banned user agents. Use the script that will automatically identify and block them based on the database. Regularly update your database with new banned user agents, as they appear quite regularly.

Only block user agents that actually affect your site. It’s useless to try to block every known IP-address, as it will make your .htaccess file too large and hard to manage. Server capacity will also decrease.

 

 

Use the “Bot Filtering” feature available in Google Analytics – “Exclude hits from known bots and spiders”.

 

Using bot filter google analytics

 

 

Monitor your server logs at least once a week.

 

It is possible to start the fight against malicious bots on the server level. Until you’re able to “scare” spam bots away from visiting your website, don’t exclude them from your Google Analytics reports.

 

 

Use firewall

 

Firewall will be a reliable filter between your computer (server) and virtual space. It can also protect your web source from malicious bots.

 

 

Get professional assistance from system administrator.

 

24-hour protection of client websites from malicious objects is his major duty. The person responsible for network security has far more tools to repel spam bot attacks that the site owner. If you discover a new bot that threatens your website, immediately inform your system administrator about your findings.

 

 

Use the ItSAlive code.

 

This small code allows to protect Google and Yandex metrics from referral spam bots. ItSALive protects access counters’ code from being executed by spam bots directly. Counter code is only launched in two cases: if a mouse was moved or if the useragent belongs to robots.

 

 

Use Google Chrome for web surfing.

 

If you don’t use firewall to browse web pages, it’s best to use Google Chrome.

Chrome is also capable to detect malware. At that, it opens webpages faster than other browsers, while also scanning them for malware.

If you use Chrome, the risk to “catch” malware on your computer is lower, even when you visit suspicious sources from Google Analytics referral traffic reports.

 

 

Use user alerts when monitoring unexpected traffic spikes.

 

Personalized notification in Google Analytics allows to quickly detect and neutralize malicious bot requests, thus minimizing their harmful effect on your site.

 

 

Use filters available in Google Analytics.

 

In order to do so, you shall select “Filters” in the “View” column on the “Admin” tab to create a new one.

 

 

create a filter in Google Analytics

 

 

Setting up filters is pretty easy. You just have to know how to do that.

 

 

installing a filter in Google Analyticss

 

 

You can use the “Bot Filtering” checkbox located in the “View Settings” section of the “Admin” tab. This won’t do any harm.

 

 

installing a filter in Google Analytics

 

 

 

Despite the simplicity of using Google Analytics filters, we don’t suggest you actually use them in practice.

 

do not use GA

 

 There are three good reason for this:

  • There are hundreds of thousands of malware bots, many of them appear on a daily basis. How many filters will you have to create and apply to your reports?
  • The more filters are applied, the harder it will be to analyze reports received from the Google Analytics service.
  • Blocking spam traffic in Google Analytics merely means hiding a problem rather than solving it. 

 

Similarly, you shall not block referral traffic using the “Referral exclusion list”, as it won’t solve your problem. Quite the opposite, this traffic will be seen as direct in the future, and as a result you’ll loose the capability to see how spam affects the traffic of your web source.

When a spam bot is included in Google Analytics statistics, information about traffic will be altered forever. You will not be able to correct it any more.

 

 

Conclusion

 

We hope that recommendations provided above will help you to get rid of all sources of spam to your website. There are numerous ways to do so, and we’ve only described those that have helped many websites to protect their data in Google Analytics.