It is almost two years since (not provided) began to pollute keyword reports in Google Analytics organic search reports.
Since that time, the proportion of searches that are logged as ‘(not provided)’ rather than with their actual search keywords has grown.
This blog post looks at the rate of that growth and wonders when it will stop.
The data sources
To gather the data for this we examined 5 websites across different sectors.
Firstly let’s look at the proportion of searches that return ‘(not provided)’. To give a smoother, easier to view visualisation, we show a 30 day moving average for this proportion.
Site 1 is in the Youth sector.
This shows, as you’d expect, the first showing of ‘(not provided)’ in October 2011. It is touching 50% by the end of August 2013. In actual fact, some individual days at the end of August are above 60%, but this is yet to emerge in the moving average.
Site 2 is in the Health and Beauty sector.
This graph shows, that for this site at least, the impact of ‘(not provided)’ while still clear, is less pronounced than site 1. The 30 day average doesn’t reach 40% by the end of August 2013. There are individual days in the data at the end of August pushing 50%, however, so this will obviously change the average as time goes on.
Site 3 is in the Tourism and Culture sector.
Similar to site 2, the impact of ‘(not provided)’ is evident, but less pronounced. However, there is strong growth in its impact late in the time period. Like site 2, there are a few individual days pushing the 50% barrier at the end of August that will affect the 30 day average moving forward.
Site 4 is in the Government sector.
Again this site demonstrates a relatively modest impact from ‘(not provide)’ until recent growth. The moving average has yet to reach 40%. However, there are many recent individual days that are above 50%, which will drive the moving average up in the coming weeks.
Site 5 is in the Gambling sector.
This site demonstrates a similar pattern to sites 2 to 4, in that the impact, until the last few months has been relatively modest, but that has grown considerably during August 2013.
Looking at 30 day average values across the total of these 5 sites seems a reasonable sample to consider. 30 days represents over 675,000 searches in the sample for each data point.
Statistics describes how confident one can be in a set of results by determining a confidence interval. This interval is typically two values which constrain 95% of the probability that predicts a future result based on the observations.
In other words, for the time period we investigate, the confidence interval is an indication of how we can scale up our sample to the population of all searches.
Our last 30 days approach shows we see 40.59% of all searches. The confidence interval is 40.47% to 40.72%, which is reasonably narrow. Our confidence is effectively plus or minus 0.12%.
|Period||Lower Interval Point||Higher Interval Point|
|2nd to 31st August, 2013||40.47%||40.72%|
However, if we look at the last 7 days of July 2013 compared to the last 7 days of August 2013 (the sample sizes are smaller, so our confidence intervals are wider – about plus or minus 0.26%).
|Period||Lower Interval Point||Higher Interval Point|
|25th to 31st July, 2013||30.68%||30.95%|
|25th to 31st August, 2013||46.65%||46.90%|
This shows some significant growth. While our 30 day sample is more confident (i.e. a narrower confidence interval), our two 7 day samples show a uniform level of growth of about 50% in relative terms. The amount of searches coming through as ‘(not provided)’ is about half as big again as it was one month previously.
ComScore reports that in 2012, the average number of searches per day in Google was just over 5.13 billion. There are not yet figures for 2013, obviously, so to get a best guess we need to assume this is broadly the same level of search as is undertaken in July and August 2013. By doing so, we can say that the number of searches returning ‘(not provided)’ across the globe has grown.
|Period||Global ‘(not provided)’ searches (approx.)|
|25th to 31st July, 2013||11.0 billion|
|25th to 31st August, 2013||16.7 billion|
There are a number of assumption we have to make to get to these figures.
The ComScore figure for daily searches in Google is an average for 2012. This figure is growing year on year (it was 4.7 billion per day in 2011, and 3.6 billion per day in 2010). Therefore, it is not directly accurate for July and August 2013, but it is the best known figure available to me at this time. The likelihood is that searches in Google are closer to 5.5 billion per day for the time period we examined. This might increase our estimates by about 7% if it were true.
Additionally, the sites we have obtained data for are primarily UK-focused, and therefore the behaviour characteristics of users are skewed to UK behaviour. This might not be representative of global search behaviour.
Furthermore, we have looked only at 5 sectors, albeit 4 of these showed very similar data.
Strictly speaking, therefore, we cannot be truly certain of the ‘randomness’ of the data – a key concept in being able to compute probability and confidence. By limiting our study to 5 websites, we haven’t allowed the data to be truly representative of the real world, although it may be reasonably so. Therefore it is a somewhat a leap of faith to apply statistical analysis.
However, we have to work with data to which we have access. Here it is, with acknowledgement of the limitations.
EDIT: I previously described some methods for using (not provided) data to analyse traffic