If you're going to be data-driven, make sure your data knows where it's going

After months of watching government coronavirus briefings, people today are more aware than ever of the importance of data-driven decision-making. Even outside those life-and-death situations, now that data is so readily available about almost every facet of day-to-day life, making major choices based solely on gut instinct seems increasingly reckless.

There's another thing the coronavirus crisis has really made clear, too: data-driven decisions are only as good as the quality of your data. There's no point making decisions based on data if that data is flawed to begin with.

But even good data can lead you astray if you don't query it properly, and this is a nuance that even the best-intentioned decision makers sometimes miss. At this point we're no longer talking about coronavirus, of course; the experts at SAGE know more about interpreting data and statistics than we ever could. But we'd like to think we have them beat when it comes to applying the basics of that kind of statistical thinking to digital strategy.

When is an increase not an increase?

Let's take a classic scenario: you've made some changes to your website, and want to know whether those changes have made people spend more time browsing.

This is where even very basic statistics can be useful. For example, let's say that you update your "Visit Us" page, and discover that the average time on that page goes from 60 seconds to 90 seconds. This might sound like a clear win — your new page is more engaging, so people are spending longer — but it's worth examining in more detail.

At a minimum, you might want to look at the "standard deviation" around the average time spent on the page. In this case, you'd be hoping for a standard deviation close to 0, which would suggest that most visitors really are spending an extra 30 seconds. A large standard deviation, on the other hand, would indicate that the apparent increase is being skewed by a few outliers — and in fact might not be an increase at all.

Try and remember how you learned to calculate the average in school: if ten visitors each spend 30 extra seconds on your site, the average increase will of course be 30 seconds. But if two visitors spend an extra three minutes each and the other eight each spend 8 seconds less, the average increase is still 30 seconds.

If this sounds obvious to you, great! You're already thinking like a statistician. Maybe you also look at the median and mode, and so you know that in the example above, most people are actually spending an extra 10 seconds on the new version of the page. That's not as good as everyone spending 30 seconds more, but it's still a good result, right?

Well, that depends, and the standard deviation comes in handy here too. In the example above, where everyone spends an extra 30 seconds, the standard deviation is 0 because none of your users stray from the average at all. But in the second example, where two people spend two minutes, the standard deviation is itself about 80 seconds — because your users are scattered very widely around the average.

This is a much more common value for the standard deviation when looking at large, real-world datasets, and if that's how your users are usually distributed then an average increase of 10 seconds or even 30 seconds might not be anything beyond random noise. In that case, your new page certainly hasn't made people spend any less time on the site, but it probably hasn't made them spend any more, either.

You've lost me

Don't feel bad! This stuff isn't straightforward. (We had to rewrite the last section about a gazillion times — another sophisticated statistical concept.)

But don't worry, you don't need to fully understand it, or any of the other more complicated statistical tools below — just take our word that they exist and can be useful. Things like:

  • Multiple regression analysis, which lets you measure the "true" effect of a single variable while controlling for other factors. (e.g. if your conversion rate is higher at lunchtime, is that because your number of users at lunchtime also changes dramatically, or is it a significant increase either way?)
  • Chi-square tests of independence, which can tell you whether any difference between two sets of data is significant or just random noise.
  • Statistical tests of probability, which, like opinion polls, tell you how likely your data is to reflect reality.
  • Measures of skew and kurtosis, which help describe how widely spread and how biased in a particular direction your data might be.

Again, you don't necessarily need to understand all these things yourself, and not all of them will be relevant to everyone. (If you're lucky you'll go your whole life without having to think about kurtosis.)

But if you're considering a major change to your website based on what you think is a clear pattern in the data, it's worth at least asking someone else — or several someone elses — to look at the same numbers and see whether they agree with you.

And if one of those people happens to understand basic statistics, well, so much the better.

Example: Google Ads

One of our clients asked us to look at their Google Ads implementation to see if we could identify any areas for improvement. We suggested drastically narrowing their geographic targeting from a whole region of the country to just a handful of areas that accounted for the majority of their sales.

After a month we looked at the ad performance to see if our changes had made any difference. They had — and at first, to our horror, it seemed as if that difference was disastrous. Impressions were down, clicks were down, and even our target metric, conversions, were down. On the face of things, the obvious response would be to reverse all our changes and start again.

But then we looked at the data a little more closely. It turned out that during that same period, organic search traffic was also down by about the same amount, or sometimes more; it wasn't that our ads were performing worse, it was that people were simply searching for our client's key search terms less.

In fact, despite narrowing the targeting, traffic via our paid search was down less than traffic via our organic search — and although the ads were being served less and hence conversions were down in absolute terms, the average conversion rate had gone up and the average cost per conversion had gone down.

Most importantly, however, by looking at the standard deviation for daily conversion rates, we demonstrated that the average daily conversion rate was now more than one standard deviation above the previous average, more than half the time. This is statistically quite unlikely, other things being equal, so we were very confident that our changes had significantly improved ad performance — just as we'd hoped, and, more to the point, contrary to what you might have concluded if you'd only glanced at the data.

Taking your analysis to the next level

It's worth emphasising again that statistics aren't easy — and they're not always necessary, either. But if you have access to someone who can bring a bit more sophistication to your analysing your website data, it can't hurt to ask.

Profile photo of Andrew Ladd

All things arts, culture and heritage Speak with Andrew