Google Analytics collects your website’s full traffic data, takes a subset of your site’s full traffic data and presents sampled data in your reports, if your Google Analytics setup is implemented on a large site that has hundreds of millions of monthly visitors.
Wikipedia defines data sampling:
Sampling is that part of statistical practice concerned with the selection of an unbiased or random subset of individual observations within a population of individuals intended to yield some knowledge about the population of concern, especially for the purposes of making predictions based on statistical inference. Sampling is an important aspect of data collection.
Google Analytics Sampled Data
Google Analytics and most top web analytics tools automatically use data sampling to ensure performance, as it is much easier to deliver fast answers from sampled data than it is from comprehensive data. Google Analytics samples data when your requested data size meets one of the conditions:
- Any non pre-computed ad hoc query that reaches 500,000 visits (sessions).
- Any query that exceeds 1,000,000 unique dimension combinations.
For example, Google Analytics may automatically display sampled data with any of the reports and/or segmenting methods:
- Keyword Report
- Top Content Report
- Top Landing Page Report
- Custom Reports
- Advanced Segments
Data sampling leads to inaccurate results when you run Google Analytics reports to segment conversions or do long tail analysis for keywords or landing pages. Your reports will show estimated range of error (with +/- percentages) for each Google Analytics metrics.
To avoid Google Analytics from providing sampled data reports, keep the time periods of your report queries under 500,000 visits (sessions).
Google Analytics Customized Data Sampling
Google Analytics allows client-side sampling by collecting a percentage of your site’s traffic rather than all the traffic:
- Client-side sampling occurs consistently across unique visitors to ensure your site’s traffic trending is correct.
- For large websites with heavy traffic spikes and/or receive hundreds of millions of monthly visitors, Google Analytics client-side sampling ensures uninterrupted report tracking.
To enable Google analytics client-side sampling, include the
_setSampleRate() method in your Google Analytics tracking code snippet (asynchronous):
The number you provide in
_setSampleRate() is the percentage of visitors by unique ID that will be tracked and included in your sample data. In the example above, the sampling rate is set to collect 25% of your website’s traffic. To get your site’s full “visits” and “pageviews” numbers, compute with:
- Total Visits = Sampled Visits / 0.25
- Total Pageviews = Sampled Pageviews / 0.25