In the picture below you have made a landing page report in Google Analytics. You use a segment you created called “Returning Customers”.
The data is sampled using only 16% of the actual data:
Using the Unsampler, the same query could be exported to a CSV, giving these results:
The difference is stunning:
In this simple example the actual numbers are less than what we thought! Are you making decisions based on these numbers?
How it works
The Roy App Unsampler uses a well-known technique of iterating your Google Analytics data day by day, thus maximing the 500.000 session sampling limit per request. This will allow for even the narrowest segment to be extracted, unsampled.
Once all days of data has been imported, the Unsampler will keep polling every new day’s data so you can make new exports from the Unsampler.
Made to look like what you’re used to
Probably you’ve come across different tools making use of the Google Analytics API. We’ve tried to make it look very similar to Google Analytics Query Explorer. Set up your queries, let the Unsampler work, and then export it to the format of your choice.
For Google Analytics to be able to serve data to all its users – remember, Google Analytics is used across half the known web – they need to limit the requests. Sampling is a way of chosing a smaller set of sessions, and the extrapolate data from that small set.
When do I get sampling in Google Analytics?
When the dreaded yellow box pops up, you know you’ve reached into sampled data.
In the Google Analytics user interface, the box will pop up in the top right corner, just below the selected date range:
In the Query Explorer, a yellow box pops up just above the results table, next to the “Get data” button:
In a Google Analytics API response, it will have the containsSampledData field set to true:
Can I trust sampled data? Not really.
In some larger aggregates, when 95% of the samples are in the real dataset, you can probably make analysis. But when you’re looking at narrow drilldowns, and the sample set is small, you can’t trust it.
Above is an example of a typical landing page report with a segment applied to it. The sampling is based on 16% of sessions. It is evident that many of the top rows have the same amount of sessions – this is due to sampling. Google Analytics guesses the visits. Once you see patterns like this with many rows of similar result metrics, in this case 12s and 6s, you should really be careful in your analysis.
Sampling is bad because you can’t trust the data any longer. Don’t worry, the data is still there, in the Google Analytics data warehouse. It is just your means of retrieving it that samples the data.
What about Google Analytics Premium?
With Google Analytics Premium, you won’t have any issues with sampling. By default, the user interface will sample (albeit with a much larger amount of samples), but you can create special unsampled reports. It is also possible to export data to Google BigQuery for really great processing.
Google Analytics Premium is one of the best deals out there, but it is rather pricy. Depending on if you make a deal with a Google Analytics Certified Partner or buy directly from Google, your cost will end up at around 150,000 USD/year.