2023
DOI: 10.1371/journal.pcbi.1010820
|View full text |Cite
|
Sign up to set email alerts
|

Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering

Abstract: In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 75 publications
0
9
0
Order By: Relevance
“…Issues around over-optimism in microbiome analysis have recently been raised. Critiques on overfitting of data point out the potential pitfalls in reliability and reproducibility of the analysis [ 47 ]. In the current study, the performance of the model was stable with the AUC value of ROC being slightly higher in the test set than in the training set.…”
Section: Discussionmentioning
confidence: 99%
“…Issues around over-optimism in microbiome analysis have recently been raised. Critiques on overfitting of data point out the potential pitfalls in reliability and reproducibility of the analysis [ 47 ]. In the current study, the performance of the model was stable with the AUC value of ROC being slightly higher in the test set than in the training set.…”
Section: Discussionmentioning
confidence: 99%
“…Such practice may seem natural at first glance so that many researchers who proceed this way may not be aware that this excessive fitting of the analysis strategy to the given data set is a form of cherry‐picking, also denoted as fishing for significance in the context of statistical tests. In other contexts than GSA, it has been demonstrated that cherry‐picking results in over‐optimistic research findings that cannot be validated on new, independent data (Ullmann et al, 2023) and thus contributes to the so‐called replication crisis. In this context, we want to emphasize that research findings that cannot be replicated on independent data are not valid.…”
Section: Uncertainties Implications and Recommendationsmentioning
confidence: 99%
“…This impression conflicts with a growing realization that there is a multiplicity of possible analysis strategies when analyzing empirical data [ 1 3 ] and that data analysts require the ability to make subjective decisions and acknowledge the multiplicity of possible perspectives [ 4 ]. In particular, so-called multianalyst projects [ 5 7 ] show that different teams of researchers make very different choices when they are asked to answer the same research question on the same data set.…”
Section: Introductionmentioning
confidence: 99%
“…This is a practice known as “p-hacking” or “fishing for significance” in the context of hypothesis testing and, more generally, “fishing expeditions” or “cherry-picking.” These practices lead to overconfident and nonreplicable research findings in the literature and, ultimately, to situations where some may argue that “most published research findings are false,” especially in combination with a low prior probability of the hypothesis being true [ 10 , 11 ]. Computational biology as a field is, unfortunately, not immune to these types of problems [ 3 , 12 ].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation