Read A Refresher on Statistical Significance
ANALYTICS A Refresher on Statistical Significance by Amy Gallo FEBRUARY , When you run an experiment or analyze data, you want to know if your ndings are sign cant But business relevance (i.e., practical sign cance) is t always the same thing as co dence that a result is t due purely to chance (i.e., statistical sign cance). This is an important distinction; unfortunately, statistical sign cance is often misunderstood and misused in organizations today. And yet because more and more companies are relying on data to make critical business decisions, i s an essential concept for managers tounderstand.
To better understand what statistical sign cance really means, I talked with Tom Redman, author of Data Driven: Pr ting from Your Most Important Business Asset . He also advises organizations on their data and data quality programs.
What is statistical significance? Statistical sign cance helps quantify whether a result is likely due to chance or to some factor of interest says Redman. When a nding is sign cant, it simply means you can feel co dent tha s it real, not that you just got lucky (or unlucky) in choosing the sample. COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. When you run an experiment, conduct a survey, take a poll, or analyze a set of data, yo re taking asample of some population of interest, not looking at every single data point that you possibly can.
Consider the example of a marketing campaign. Yo ve come up with a new concept and you want to see if it works better than your current one. You ca t show it to every single target customer, of course, so you choose a sample group.
When you run the results, you nd that those who saw the new campaign spent $10.17 on average, more than the $8.41 those who saw the old one spent. This $1.76 might seem like a big and perhaps important difference. But in reality you may have been unlucky, drawing a sample of people who do not represent the larger population; in fact, maybe there was no difference between the two campaigns and their influence on consumer purchasing behaviors. This is called a sampling error , something you must contend with in any test that does not include the entire population of interest.
Redman notes that there are two main contributors to sampling error: the size of the sample and the variation in the underlying population. Sample size may be intuitive enough. Think about flipping a coin ve times versus flipping it 500 times. The more times you flip, the less likely yo ll end up with a great majority of heads. The same is true of statistical sign cance: with bigger sample sizes, yo re less likely to get results that reflect randomness. All else being equal, yo ll feel more comfortable in the accuracy of the campaign $1.76 difference if you showed the new one to 1,000 people rather than just 25. Of course, showing the campaign to more people costs more, so you have to balance the need for a larger sample size with your budget.
Variation is a little trickier to understand, but Redman insists that developing a sense for it is criticalfor all managers who use data. Consider the images below. Each expresses a different possible distribution of customer purchases under Campaign A. In the chart on the left (with less variation), most people spend roughly the same amount of dollars. Some people spend a few dollars more or less, but if you pick a customer at random, chances are pretty good that the ll be pretty close to the average. So i s less likely that yo ll select a sample that looks vastly different from the total population, which means you can be relatively co dent in your results.
Compare that to the chart on the right (with more variation). Here, people vary more widely in how much they spend. The average is still the same, but quite a few people spend more or less. If you pick a customer at random, chances are higher that they are pretty far from the average. So if you select a sample from a more varied population, you ca t be as co dent in your results.
To summarize, the important thing to understand is that the greater the variation in the underlying population, the larger the sampling error. 3 COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. Redman advises that you should plot your data and make pictures like these when you analyze the data. The graphs will help you get a feel for variation, the sampling error, and, in turn, the statistical sign cance. No matter what yo re studying, the process for evaluating sign cance is the same. You start by stating a null hypothesis, often a straw man that yo re trying to disprove. In the above experiment about the marketing campaign, the null hypothesis might be On average, customers do t prefer our new campaign to the old one Before you begin, you should also state an alternative hypothesis, such as On average, customers prefer the new one and a target sign cance level. The sign cance level is an expression of how rare your results are, under the assumption that the null hypothesis is true. It is usually expressed as a p-value and the lower the p-value, the less likely the results are due purely to chance.
Setting a target and interpreting p-values can be dauntingly complex. Redman says it depends a lot on what you are analyzing. If yo re searching for the Higgs boson, you probably want an extremely low p-value, maybe 0.00001 he says. But if yo re testing for whether your new marketing concept is better or the new drill bits your engineer designed work faster than your existing bits, then yo re probably willing to take a higher value, maybe even as high as 0.25 Note that in many business experiments, managers skip these two initial steps and do t worry about sign cance until after the results are in. However, i s good scient c practice to do these two things ahead of time.
Then you collect your data, plot the results, and calculate statistics, including the p-value, which incorporates variation and the sample size. If you get a p-value lower than your target, then you reject the null hypothesis in favor of the alternative. Again, this means the probability is small that your results were due solely to chance. 4 COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. How is it calculated?As a manager, chances are you wo t ever calculate statistical sign cance yourself. Most good statistical packages will report the sign cance along with the results says Redman. There is also a formula in Microsoft Excel and a number of other online tools that will calculate it for you.
Still, i s helpful to know the process described above in order to understand and interpret the results. As Redman advises, Managers should not trust a model they do t understand How do companies use it? Companies use statistical sign cance to understand how strongly the results of an experiment, survey, or poll the ve conducted should influence the decisions they make. For example, if a manager runs a pricing study to understand how best to price a new product, he will calculate the statistical sign cance with the help of an analyst, most likely so that he knows whether the ndings should affect the nal price.
Remember that the new marketing campaign above produced a $1.76 boost (more than 20%) in average sales? I s surely of practical sign cance. If the p-value comes in at 0.03 the result is also statistically sign cant, and you should adopt the new campaign. If the p-value comes in at 0.2 the result is not statistically sign cant, but since the boost is so large yo ll likely still proceed, though perhaps with a bit more caution.
But what if the difference were only a few cents? If the p-value comes in at 0.2, yo ll stick with your current campaign or explore other options. But even if it had a sign cance level of 0.03, the result is likely real, though quite small. In this case, your decision probably will be based on other factors, such as the cost of implementing the new campaign.
Closely related to the idea of a sign cance level is the notion of a co dence interval. Le s take the example of a political poll. Say there are two candidates: A and B. The pollsters conduct an experiment with 1,000 likely voters 49% of the sample say the ll vote for A, and 51% say the ll vote for B. The pollsters also report a margin of error of +/- 3%.
Technically says Redman, 49% +/-3% is a 95% co dence interva for the true proportion of A voters in the population Unfortunately, he says, most people interpret this as ther s a 95% chance that s true percentage lies between 46% and 52% but that is t correct. Instead, it says that if the pollsters were to do the result many times, 95% of intervals constructed this way would contain the true proportion.
If your head is spinning at that last sentence, yo re not alone. As Redman says, this interpretation is maddeningly subtle, too subtle for most managers and even many researchers with advanced degrees He says the more practical interpretation of this would be Do t get too excited that B has a lock on the electio or B appears to have a lead, but i s not a statistically sign cant one Of 5 COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. course, the practical interpretation would be very different if 70% of the likely voters said the d vote for B and the margin of error was 3%.
The reason managers bother with statistical sign cance is they want to know what ndings say about what they should do in the real world. But co dence intervals and hypothesis tests were designed to support science where the idea is to learn something that will stand the test of time says Redman. Even if a nding is t statistically sign cant, it may have utility to you and your company. On the other hand, when yo re working with large data sets, i s possible to obtain results that are statistically sign cant but practically meaningless, like that a group of customers is 0.000001% more likely to click on Campaign A over Campaign B. So rather than obsessing about whether your ndings are precisely right, think about the implication of each nding for the decision yo re hoping to make. What would you do differently if the nding were different? What mistakes do people make when working with statistical significance?
Statistical sign cance is a slippery concept and is often misunderstood warns Redman. I do t run into very many situations where managers need to understand it deeply, but they need to know how to not misuse it Of course, data scientists do t have a monopoly on the word sign cant and often in businesses i s used to mean whether a nding is strategically important. I s good practice to use language tha s as clear as possible when talking about data ndings. If you want to discuss whether the nding has implications for your strategy or decisions, i s ne to use the word sign cant but if you want to know whether something is statistically sign cant (and you should want to know that), be precise in your language. Next time you look at results of a survey or experiment, ask about the statistical sign cance if the analyst has t reported it.
Remember that statistical sign cance tests help you account for potential sampling errors, but Redman says what is often more worrisome is the non-sampling error: Non-sampling error involves things where the experimental and/or measurement protocols did t happen according to plan, such as people lying on the survey, data getting lost, or mistakes being made in the analysis This is where Redman sees more troubling results. There is so much that can happen from the time you plan the survey or experiment to the time you get the results. m more worried about whether the raw data is trustworthy than how many people they talked to he says. Clean data and careful analysis are more important than statistical sign cance. Always keep in mind the practical application of the nding. And do t get too hung up on setting a strict co dence interval. Redman says ther s a bias in scient c literature that a result was t publishable unless it hit a p = 0.05 (or less) But for many decisions like which marketing approach to use yo ll need a much lower co dence interval. In business, Redman says, ther s often more important criteria than statistical sign cance. The important question is, Does the result stand up in the market, if only for a brief period of time COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. As Redman says, the results only give you so much information: m all for using statistics, but always wed it with good judgment Amy Gallo is a contributing editor at Harvard Business Review and the author of the HBR Guide to Managing Co ict at Work . She writes and speaks about workplace dynamics. Follow her on Twitter at @amyegallo. COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. Copyright 2016Harvard Business Publishing. AllRights Reserved. Additional restrictions may apply including theuse ofthis content asassigned coursematerial. Pleaseconsult your institution’s librarianaboutanyrestrictions thatmight applyunder thelicense withyour institution.
Formore information andteaching resources fromHarvard Business Publishing including HarvardBusiness SchoolCases,eLearning products,andbusiness simulations please visithbsp.harvard.edu.