Article
Author(s):
New method in reporting P values adjusts for multiplicity to more accurately reflect meaningful statistics.
To avoid overuse and misinterpretations, the New England Journal of Medicine (NEJM) is changing how they report P values for both clinical trials and observational studies.
The new statistical guidelines for authors, published last week, include replacing P values with estimates of effects or association, and 95% confidence intervals (CIs) when neither the protocol nor statistical analysis plan has established methods to adjust for multiplicity.
The new method for reporting P values relies on 3 premises: adhering to a pre-specified analysis plan if possible; using statistical thresholds to claim an effect or association should be limited to analyses for where analysis plans outline a method to control type I error; and evidence about the benefits and harms of a treatment or exposure should include both point estimates and margins for error.
“Journal editors and statistical consultants have become increasingly concerned about the overuse and misinterpretation of significance testing and P values in the medical literature,” wrote a team of Boston- and UK-based researchers regarding the editorial decision.
P values indicate how incompatible observed data is with a null hypothesis, where P< .05 implies a treatment effect or exposure association larger than what is observed occurs less than 5% of the time under a null hypothesis, with no effect or association and no confounding.
Concluding the null hypothesis is false when it is true has a likelihood of less than 5%, but when P values are reported for multiple outcomes without adjusting for multiplicity, the probability for declaring a treatment difference when none exists can be significantly higher than 5%.
Even when adjusting for multiplicity is required, P values do not represent the probability that the null hypothesis is false.
An underlying issue with P values is they provide no information about the variability of an estimated association, and non-significant P values do not distinguish between group differences that are truly negligible and group differences that aren’t informative because of substantial standard errors, the authors wrote.
P values also do not provide information on the size of an effect or an association.
The use of P values to summarize evidence require thresholds with a strong theoretical and empirical justification and proper attention to the error that can result from uncritical interpretations of multiple inferences.
This is largely due to multiple comparisons occurring when comparisons are conducted by instigators, but not officially reported in a manuscript.
While listing some of the issues with using P values, the NEJM authors wrote they do have a key role in medical research and should not be completely removed.
“A well-designed randomized or observational study will have a primary hypothesis and a prespecified method of analysis, and the significance level from that analysis is a reliable indicator of the extent to which the observed data contradict a null hypothesis of no association between an intervention or an exposure and a response,” the editors write.
They advised clinicians and regulatory entities make distinction as to which treatment is used or given marketing approval, and to have the role of P values interpreted by “reliably calculated thresholds subjected to appropriate adjustments.”
P values have been scrutinized in recent years. In an editorial to JAMA, John PA Ioannidis, MD, DSc, of Stanford University said 96% of articles that report P values include some values of 0.05 or less—creating challenges in biomedical science and other science disciplines. He said many of the report claims are likely false.
One way to adjust P values in a manner that prioritizes accurate representation is to lower the routine P value threshold for claiming statistical significance from 0.05 to 0.005 for new discoveries, as proposed by a coalition of 72 methodologists.