Smartphone Shopping Guide 2019

How to Exclude Bot Visits/Pageloads in Statcounter Web Analytics

Statcounter

For those who prefer statcounter in monitoring day-to-day activity of their blog, there are some thing that might help narrow down your stats to better reflect the actual visits and pageloads your site is getting, and not a rough estimate.

By default, statcounter does not only track incoming visitors to a specific webpage, it also track search engines bot and other bots and reflect them alongside with site visits and pageload statistics in dashboard. To better understand how this greatly affect your tracking, we'll share some insights.

We've been statcounter user for a while, and in our years of using this platform on most of our blogs, we have learned that each time we published a post, not less than two Google bots will visit the published page immediately to crawl it (this does not necessarily mean the post has been or will be indexed), each bot bearing different I.P and sometime different domain extension. That is for one publication on a site receiving like 400 visits a day.

The higher the number of site traffic, the higher the number of bots it attracts, a basic fact for all websites. So let's say you are a blogger getting a thousand visit and publishing like 10 articles a day, then you can as well expect nothing less than 40 Google bots to crawl your site in a day (excluding other bots), all of which are added to your visits and pageload stats in statcounter.

For now, we will be focusing on Google bots only. There is one other thing you need to know. Google have hundreds (if not thousands) of crawlers (or bots) governing the web, and each crawler has its own dedicated IP address. While statcounter have the ability to identify and track a single visitor irrespective of changes in his IP address or browser he use to access the site at a given time using some algorithms based on cache and cookies, this does not seem so with bots/crawlers visit.

Bot can be described as a lifeless being in a living body - ghost would have been a perfect word to describe it but it has a footprint (IP, Region/Location e.t.c). Bots have the ability to avoid and being thoroughly screened like real human, though not all of them.

In one word, each Google bot visit is counted as a real visitor in statcounter by default. So let's say statcounter recorded 950 visits and 1,450 pageloads in a day, the actual or real visits might be 800 and pageloads 1,200. This does not exclude Google Analytics as well, but that is another  topic which we won't be able to discuss for now (Google Analytics have bots filtering, you'll need to enable it though).

We believe all webmasters know that the only way to know how successful a site will be is via its analytics. It is a crucial thing that should never be underrated or overlooked, and by ignoring this little (but huge) miscalculation can lead to ones doom. So, how do we solve the problem?

As said earlier, we will be focusing on Google bots only. After monitoring these Google bots activity (via statcounter) for a long time, we were able to to identify Google bots IP patterns that usually crawl the given site, and that gave us a solution in excluding them from our site analytics. There is an option in statcounter that let you filter out IP's you don't want statcounter to track, and only with this option can you narrow down the percentage of miscalculation in your site analytics.

As we already identified these Google bots IP range, we will be explaining how you can apply it in your statcounter settings by following the steps below.


  1. Login to your statcounter dashboard on PC or mobile.
  2. Scroll down and click on Project Settings
  3. You will see IP Blocking among the list of options
  4. Enter the following IP in the box (exactly as written below)

64.233.*.*
66.102.*.*
66.249.*.*

The asterisks (*) here functioned as wildcard. It represents other different occurrences that are likely to occur within the sets of IP at that range. There are thousands of Google bot patrolling the internet, each with a dedicated IP following nearly the same pattern. The only way to address and capture them all is to make use of wildcard. When we said "ALL", we do not mean the entire Google IP, but the most common range. Please, do note that different Google bot IP apply to different sites and different zones, you may need to figure yours out.

Doing the following will give you a better result. And if you notice other bots crawling your site to destabilize your statcounter analytic, just write down its IP and add it to the Block list. Also, make sure you create Blocking Cookie (under Installation and configuration settings).

Basic Ways to Identify a Bot

Some might ask, how do we identify a bot? Identifying a bot is simple and sometimes (in rare occasion) complex. You might be wondering why we do not mention Yandex and Bing bots, it is because these bots (Yandex and Bing) are configured to work in stealth and mostly does not reflect their true identity.

It is easy to identify Google bots by just glaring at the label names in statcounter (and other analytics tool) because they all wear Google badge and are traceable. But Bing bot doesn't wear Bing badge, they wear something else that are not easily identifiable. And unlike Google bots, they rarely crawl a site multiple times a day, so they pose not much treat in increasing the figure of your website statistics.

Do not forget there are hundreds of other search engines on the planet, but thankfully more than half of these search engines depend (or should we say use) Google's own search result API to deliver their results, meaning, they do not send out bots to harvest the web but depend solely on Google Search to deliver their results. Also, some of these search engines are region specific in gathering their information, which mean only websites in a specific zone are nurture and cater for e.g yandex.ru

How to identify a bot

1.) It tend to land on the homepage rather than separate pages of your site.

2.) Bots are persistent, very persistent (especially the dangerous ones).

3.) It give an 100% bounce rate in analytics.

4.) Most times it is anonymous.


Any visit that tends to have these characteristics should be examined.



About CCN World Tech

logo
CCN World Tech is a platform specifically dedicated in providing latest tech related news and articles around the world. We've existed since 2011 and have tutored and helped countless of people on how to get the best out of their handheld and tech pertaining devices. Learn more about CCN World Tech.
    Post Comment

Disclaimer: Information provided on CCN World Tech were verified and were deemed to be accurate, but notwithstanding, they are subjected to be edited, rewritten, or modified at anytime.