The lesson of ivermectin: meta-analyses based on summary data alone are inherently unreliable

Lawrence, Jack M; Meyerowitz-Katz, Gideon; Heathers, James; Brown, Nicholas J. L.; Sheldrick, Kyle

doi:10.1038/s41591-021-01535-y

Cited by 67 publications

(68 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this light, there is also no problem in analyzing one and the same data set in different ways to explore the sensitivity of results to violations of assumptions (Greenland 2020) -as long as these explorations are fully reported, not just the ones producing a desired outcome such as the smallest P-value. And perhaps even more important than full reporting of summary statistics is that we ensure free access to the underlying raw data for meta-analysts (Lawrence et al, 2021;Whitlock et al, 2010).…”

Section: Full Reporting Rather Than Filtering Of Resultsmentioning

confidence: 99%

Why and How We Should Join the Shift From Significance Testing to Estimation

Berner¹,

Amrhein²

2021

Preprint

View full text Add to dashboard Cite

A paradigm shift away from null hypothesis significance testing seems in progress. Based on simulations, we illustrate some of the underlying motivations. First, P-values vary strongly from study to study, hence dichotomous inference using significance thresholds is usually unjustified. Second, statistically significant results have overestimated effect sizes, a bias declining with increasing statistical power. Third, statistically non-significant results have underestimated effect sizes, and this bias gets stronger with higher statistical power. Fourth, the tested statistical hypotheses generally lack biological justification and are often uninformative. Despite these problems, a screen of 48 papers from the 2020 volume of the Journal of Evolutionary Biology exemplifies that significance testing is still used almost universally in evolutionary biology. All screened studies tested the default null hypothesis of zero effect with the default significance threshold of p = 0.05, none presented a pre-planned alternative hypothesis, and none calculated statistical power and the probability of ‘false negatives’ (beta error). The papers reported 49 significance tests on average. Of 41 papers that contained verbal descriptions of a ‘statistically non-significant’ result, 26 (63%) falsely claimed the absence of an effect. We conclude that our studies in ecology and evolutionary biology are mostly exploratory and descriptive. We should thus shift from claiming to “test” specific hypotheses statistically to describing and discussing many hypotheses (effect sizes) that are most compatible with our data, given our statistical model. We already have the means for doing so, because we routinely present compatibility (“confidence”) intervals covering these hypotheses.

show abstract

Section: Full Reporting Rather Than Filtering Of Resultsmentioning

confidence: 99%

Why and How We Should Join the Shift From Significance Testing to Estimation

Berner¹,

Amrhein²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Worse again, poor research practices can even fuel misinformation [ 45 , 46 ] (and disinformation) around science and medicine, giving a veneer of respectability to wrong-headed positions. Pertinent examples of this range from the fraudulent research that deviously and wrongly linked the measles-mumps-rubella vaccine to autism [ 47 , 48 ], to the substandard trials that gave the false impression ivermectin was a viable COVID treatment [ 49 , 50 ]. The unsettling reality is that poor statistical practice renders swathes of biomedical research worse than useless.…”

Section: Discussionmentioning

confidence: 99%

The new normal? Redaction bias in biomedical science

Grimes

Heathers²

2021

R. Soc. open sci.

View full text Add to dashboard Cite

A concerning amount of biomedical research is not reproducible. Unreliable results impede empirical progress in medical science, ultimately putting patients at risk. Many proximal causes of this irreproducibility have been identified, a major one being inappropriate statistical methods and analytical choices by investigators. Within this, we formally quantify the impact of inappropriate redaction beyond a threshold value in biomedical science. This is effectively truncation of a dataset by removing extreme data points, and we elucidate its potential to accidentally or deliberately engineer a spurious result in significance testing. We demonstrate that the removal of a surprisingly small number of data points can be used to dramatically alter a result. It is unknown how often redaction bias occurs in the broader literature, but given the risk of distortion to the literature involved, we suggest that it must be studiously avoided, and mitigated with approaches to counteract any potential malign effects to the research quality of medical science.

show abstract

“…Recent meta-analyses have shown that ivermectin is not beneficial for the prevention or treatment of COVID-19 1 . In addition, the largest clinical trial showing a benefit of ivermectin was recently withdrawn by the preprint organization that had posted it, due to discrepancies identified in individual patient data 2,3 .…”

Section: Promotion Of Non-evidence-based Therapeutics Within Patient-led Long Covid Support Groupsmentioning

confidence: 99%

“…Unfortunately, the use of ivermectin has been widely promoted based on flawed data and without regard for the analyses of subsequent studies 3 . This is similar to the situation with hydroxychloroquine in 2020, in which the use of the drug was widely promoted based on a limited number of small-scale studies showing a potential benefit in acute COVID-19 disease.…”

Section: Promotion Of Non-evidence-based Therapeutics Within Patient-led Long Covid Support Groupsmentioning

confidence: 99%