What is power analysis?
A power analysis is the calculation used to estimate the smallest sample size needed for an experiment, given a required significance level, statistical power, and effect size. It helps to determine if a result from an experiment or survey is due to chance, or if it is genuine and significant.
In order to understand where a power analysis fits in statistical science, a few other terms and processes need to be explained.
Statistics for beginners
A hypothesis is an idea that can be tested. Basically, any claim made that can be tested will have a hypothesis, an idea about the outcome. For instance, a dog owner noted his dog seemed to pay more attention to the morning paper if there was a cat featured in that day’s paper. The owner designs an experiment to see if dogs will read the newspaper more often if there is a cat above the fold line.
A statistical hypothesis test assumes there will be a certain outcome, called the null hypothesis. The null hypothesis (H0) is that the presence of a cat printed on the newspaper will not increase the likelihood that a dog will read the paper. The opposite of a null hypothesis is called an alternative hypothesis (H1 or HA) and basically covers any other effect the cat may have on the dog.
The idea of a null hypothesis is that research or an experiment is conducted that tries to disprove the null hypothesis. In order to fully support a hypothesis, then there needs to be a p-value (probability value) that measures the likelihood that the result was due to the variables and not to chance. This p-value is also called statistical significance.
The statistical power of a hypothesis test is the probability (p-value) of finding an effect. The higher the statistical power, the less likely the probability of making a false negative error. A higher power indicates the probability of accepting the alternative hypothesis when the null hypothesis is true. The problem arises when an experiment is conducted with low statistical power because then a conclusion drawn could be disregarded as being false.
Commonly, statistical power of 80 percent or more is needed in order for the result to be accepted. With an 80 percent power, that means there’s only a 20 percent probability of an error.
This brings us to power analysis and how statistical power is assessed:
Power analysis
Statistical power is made of four related parts.
- Effect size: The quantified size of a result present in the population.
- Sample size: A number of things measured: How many dogs do we need to observe?
- Significance: The level of significance used in the experiment, generally 5 percent.
- Statistical power: The probability of accepting the alternative hypothesis.
All these variables are inter-linked; more dogs tested can make the effect easier to detect, and the statistical power may be increased by growing the significance level.
A power analysis estimates one of these four parameters, when given the values for the remaining three. Most commonly, it is used to estimate the minimum sample size needed for an experiment. Additionally, multiple power analyses are used to provide a curve of one parameter versus another, for instance, if the change in the effect size is due to the changes in a sample size. This is incredibly useful when designing an experiment to structure it better, have a better power size, and hopefully result in a more statistically significant result.
This is important because testing, experiments, and surveys are expensive to conduct. An organization does not want to run an experiment and realize afterwards that the sample size was too small to determine if the outcome was genuine or not.
Priori power analysis
A power analysis can be done both before and after the data is collected. If done before, it is called priori power analysis, afterwards, it’s called post-hoc or retrospective power analysis. While a priori power analysis is generally done before research to ensure adequate power, a post-hoc analysis is done to determine the power of the study. However, post-hoc analysis is not generally recommended as it can result in power approach paradox, where a null result study is attributed with more power despite the p-value being smaller.
Three considerations for power analysis
There are three things that power analysis takes into account that must be assessed before any study is undertaken:
Simple sampling population
General sample size calculations assume a normal, bell-curve shaped (Gaussian) population distribution. Complex studies and designs, such as stratified random sampling, must take variations of subpopulations into account. Otherwise, variabilities of populations cannot be assumed.
Appropriate sample size
The type of statistical analysis will dictate the sample size needed. Descriptive statistics only require a “reasonable” sample size. However, multiple regression, ANOVA, or a log-linear analysis may require a larger sample size. There may also be a need for a comparative analysis within sub-groups in the testing groups, so a much bigger sample is required.
Compensation for error rate
The sample size does not have to just meet the requirements, but they must be sufficient to account for people that the researcher has to remove from the sample. This could be due to some of the samples being extreme outliers, not completing the experiment correctly, or errors in recording outcomes. Many researchers add a 25 percent buffer to their sample size to account for this.
How is power analysis used in business?
Funding agencies, research review boards, and ethics panels will often request a power analysis. For instance, if a skincare company does experiments to ensure the safety of their product, the power analysis is needed to ensure the experiment is informative and achieves its purpose.
Many businesses conduct experiments constantly for their own internal purposes too. For instance, streaming providers are constantly trying new features and testing on live customers. If enough users like the features enough, they are implemented and rolled out to users. This is fantastic for helping to avoid customer churn and making sure customers are happy.
However, it is not as simple as trying the new feature on people and then implementing it if more than half of the tested people like it. You need to take into account a range of factors like geographic distribution, demographics tested, and a range of other differences.
But you also do not want to test a million people if only 100 are needed—it is expensive. Power analysis allows a company to assess the sample size needed and only spend the time and money needed to ensure the correct response.
Why statistical power is needed
Statistical power helps researchers to avoid both type I and type II errors. Type I error is accepting the null hypothesis when it is false—for instance, a doctor telling a man he is pregnant would be a type I error. Type II error is a false negative, so telling a pregnant woman that she is not. Both errors can be extremely problematic. In business studies, the observed likelihood of type II errors is 92 percent for small effect sizes, and 45 percent for medium effect sizes.
The real-life wrong response, for either a streaming provider or a skincare company, could be catastrophic. In the case of a cosmetics company, a type I error is saying that a product is safe, when it actually damages skin. A type II error is saying the formulation is harmful, when it’s not. So a type I error means releasing a product that’s harmful and causes skin rashes. A type II error is saying the formulation is toxic when it is not, and wasting all the resources, time, and money that went into the research and formulation.
For the streaming service example, a type I error is saying the customers liked something when they did not. Releasing this new feature into the wrong market, or when it is disliked, will cause customers to end their relationship with their streaming provider and move to their competitors. A type II error would be saying that customers don’t like the changes when in fact they do—and risk not rolling out a feature that could make more money, make customers happy, and move the company ahead of their competitors.
Benefits of power analysis
Statistical significance
The obvious benefit of power analysis is ensuring that the result from the research or study has statistical significance: the answer gained is correct and cannot be attributed to chance. This means the results of the study can be acted upon with the knowledge the outcomes will be positive for the business.
Less potential harm
Especially when the outcomes of a study are about people, knowing that the outcome is likely correct makes product releases safer, reducing the risk of harm to users. Having a lot of power means that the study results will not return a type I error.
Accounts for error within a sample population
If a company is planning to roll out a new feature, they can run testing and be reasonably assured that the result is correct. But power analysis accounts for populations and subgroups within the main groups, so companies can have more control of the final result. For instance, if a subgroup of a population is not taken into account and the number of surveys or diversity of questioning wasn’t broad enough, the company could make a costly mistake.
Challenges of power analysis
Retrospective power analysis
There is little-to-no point in conducting a retrospective power analysis. It is analogous to conducting a post-mortem. The project or research has already been done, and the retrospective power analysis is simply identifying one of the likely factors why it failed. At this point, there is no resuscitation of the research, it cannot be resolved and repaired—the only way to fix this is to chalk it up to experience and do a priori power analysis next time.
Larger sample sizes detect small effects too
The bigger the sample size, the more likely a small effect will be detected. And finding that effect may not be worth it. A large sample size is not always the answer and may simply result in an unimportant small effect being detected.
Focus on power neglects other important outcomes
Focusing only one power in a study can mean that other useful data and outcomes can be neglected. Estimates and confidence intervals are also important. Thinking more about the inherent value of the information rather than increased power can show a more meaningful array of findings.
False assurances at 80 percent
A claim of having 80 percent power actually provides no real information about the value of a particular study. It is a false narrative to assume that simply because power is over, or under, 80 percent, that a null hypothesis can be supported or rejected. For instance, if 40 pregnant women were studied and given vitamin C tablets, but the supplementation only saved one baby’s life, it would be deemed not supported. However, one life saved is incredibly valuable.
Researchers need to ascertain if the power is important, or if one or two outcomes versus no harm is actually supporting an alternative hypothesis, despite the low power.
Alternatives to power analysis
Meta-experiment
Instead of one large study and doing a power analysis to attain the “correct” size for statistical power, it is suggested that meta experiments may yield the same result. Having, for example, three experiments carried out in different locations, using the same fixed sample size. This can help avoid the problem associated with running large trials. However, this strategy will not work if it is likely that there is a very small treatment effect.
Monte Carlo simulation
In modern computing, there is power and ability to process huge volumes of data that previously had not been possible. Historically, the power analysis was used to determine significance. However, Monte Carlo simulations take scenarios and can model probabilities from a variety of scenarios based on random variables. So, rather than testing and calculating based on research, computers can predict outcomes instead. This can take away elements of risk and uncertainty in predictions, rather than looking backwards and basing future behavior on previous results.