Recently, one of my clients had a scare where they thought that a large amount of traffic was not being tracked by Google Analytics. This led me to run a test using a second third-party tracking system from StatCounter to confirm whether or not the data being reported was accurate.
The data
Below is a subset of the raw data that were captured during the course of this study. The study is actually still in progress, since not enough has been captured yet to determine any useful trends in why the reported numbers differ more on some days and less on others. I will provide an update later on trends if any are present.
| Day of month | Google Analytics | StatCounter | ∆ | ∆% |
| Sat 22 | 2984 | 2976 | -8 | -0.2688% |
| Sun 23 | 2657 | 2591 | -66 | -2.5473% |
| Mon 24 | 2394 | 2360 | -34 | -1.4407% |
| Tue 25 | 2476 | 2434 | -42 | -1.7256% |
| Web 26 | 2636 | 2601 | -35 | -1.3456% |
| Thu 27 | 3398 | 3385 | -13 | -0.3840% |
| Fri 28 | 3003 | 2981 | -22 | -0.7380% |
| Sat 29 | 3121 | 3136 | 15 | 0.4783% |
| Sun 30 | 3033 | 2979 | -54 | -1.8127% |
| Mon 31 | 2745 | 2737 | -8 | -0.2923% |
| Total | 28447 | 28180 | -267 | -0.9475% |
* Since I was simply performing a sanity check against Google Analytics, I only used one additional tracker. If there had been serious discrepancies between the two trackers, a third would have been added to determine which of the two was reporting incorrectly.
Initial conclusions
While collection is still ongoing for the foreseeable future, the data seems to indicate that there is no real problem with the pageview data that Google Analytics collects. The discrepancy between the two sets is minimal, and there is no correlation between the number of uncounted visits and the amount of traffic going to the site on a particular day. As a result, for anyone with moderate-to-high levels of site traffic, the difference should be unnoticeable. For sites with low traffic levels, the small loss in precision becomes exaggerated, but should nevertheless not pose a serious problem.
As to why this is happening, I can only hypothesise. Without knowledge of specific instances where a visit was counted by StatCounter but not by Google Analytics, it will be impossible to actually determine what is occurring. That said, here are some of the most plausible explanations I can come up with:
- Users are more aware of Google Analytics and may choose to block transmission specifically to Google, whereas StatCounter is less well-known and may not be blocked by these individuals.
- Google receives orders of magnitude more data than StatCounter and may have more periods where requests are rejected due to processing limitations.
What about good old-fashioned server log-based analytics?
For fun, I also looked up the data from our available log-based statistics systems, AWStats and Webalizer. Since they both run against the same logs, their numbers are nearly identical (though, surprisingly, not exactly identical, likely indicating another issue with the way they are instantiated by cPanel). The direct log-based data shows extremely different numbers from Google Analytics and StatCounter—4 times the number of pageviews, in fact. However, it is safe to conclude these log-based numbers are bogus, for several reasons:
- Visits from bots and content scrapers are included.
While AWstats tries to avoid lumping robot traffic in with visitor traffic, it does not do a particularly good job. Additionally, certain anti-virus software and spambots often will masquerade as IE6 visitors, even though they are not. (This is a particular sore point for me—given how problematic supporting IE6 is, having fake IE6 traffic makes it harder to determine when it can stop being supported completely.) A JavaScript-based analytics system like Google Analytics or StatCounter will always provide more accurate data on real visitors because, for the most part, only real browsers execute JavaScript. (I know there are some experimental robots that will try to handle JavaScript, but they are not complete enough to trigger an analytics hit as far as I am aware.) - AWStats counts anything with an extension not on its “NotPageList” as a pageview.
By default, the NotPageList is spartan and includes only CSS, JavaScript, and common image file extensions. SWF files, ZIP files, FLVs, and MP4s are all considered by AWStats to be a pageview, even though they are not, which causes artificial inflation such as we are seeing here. - Sites with pages that include dynamic content may hit the server multiple times for what should be counted as a single pageview.
A server log-based analytics tool will count each server request separately, even though they are all from a single page load. A JavaScript-based tool will only count pageviews when you tell it to, with the added benefit of being able to track outgoing links as well as the usage of third-party services on your site.
Overall, Google Analytics continues to be an excellent choice when looking for a free analytics tool. It gives users a huge amount of power to pull all sorts of useful metrics and other interesting information from their visitor data. There are, of course, a plethora of other options out there, and through researching this issue I now have a short list of other products that I will be evaluating in the future. For the time being, though, I don’t see any particular reason to avoid Google Analytics for fear of unreliability.