Should p-value thresholds be cut to raise data standards?

Reducing the required p-value to prove statistical significance would help reduce false positive results and address the lack of reproducibility in scientific studies claiming new discoveries, a Stanford statistician argues. (TeroVesalainen/Pixabay)

For decades, scientific studies have hinged on showing a p-value of less than 0.05 as evidence that a study readout is genuine—but calls are growing for a new approach.

Almost all published studies rely on that threshold for statistical significance but, according to Stanford University statistician John Ioannidis, M.D., D.Sc., “many of the claims that these reports highlight are likely false” and p-values are often “misinterpreted, overtrusted and misused.”

In fact, less-than-rigorous interpretations of studies that pass the p-value threshold could be a primary reason why even well-established studies across scientific disciplines are often hard to reproduce on retesting, he suggested in a Journal of the American Medical Association (JAMA) article.

FREE DAILY NEWSLETTER

Like this story? Subscribe to FierceBiotech!

Biopharma is a fast-growing world where big ideas come along every day. Our subscribers rely on FierceBiotech as their must-read source for the latest news, analysis and data in the world of biotech and pharma R&D. Sign up today to get biotech news and updates delivered to your inbox and read on the go.

“Multiple misinterpretations of P values exist, but the most common one is that they represent the ‘probability that the studied hypothesis is true,’” he wrote, adding that basing scientific conclusions or business and policy decisions on that interpretation is a minefield. Most claims that scrape under the 0.05 threshold are “probably false … i.e., the claimed associations and treatment effects do not exist [and] even among those claims that are true, few are worth acting on in medicine and health care.”

How to fix the problem remains a contentious issue however. One proposal to simply reduce the threshold for significant by a factor of 10 to a p-value of 0.005—with studies meeting the current threshold deemed “suggestive” of an effect—has met with a mixed response.

Proponents—and Ioannidis himself is among the signatories to that call for a redefinition of significance—would help reduce false positive results and address the lack of reproducibility in scientific studies claiming new discoveries. Lowered thresholds have already been used with success in studies looking for associations in population genomics datasets.

He cautioned however that this may be a short-term fix in other types of biomedical research, working “as a dam that could help gain time and prevent drowning by a flood of statistical significance,” while other statistical approaches are sought, such as Bayesian inferential tools.

That could include abandoning statistical significance thresholds or p-values altogether. With big data now being increasingly being tapped in healthcare, statistical significance is becoming irrelevant as “extremely low P values are routinely obtained for signals that are too small to be useful even if true.”

If p-values continue to be used—which seems likely at least in the near-term—Ioannidis does believe that “lower thresholds are probably preferable for most observational research.”

Suggested Articles

What the NASH field needs, says Genfit CEO Pascal Prigent, is something like the Hb1Ac test for diabetes.

Dubbed “Project Nightingale,” the efforts were announced amid concerns and federal inquiries into the data’s safekeeping and patient consent for use.

Blocking a newly discovered molecule produced by B cells could slow their flow into the brain and offer a new way to treat MS, a Canadian team found.