2021
DOI: 10.1038/s41386-021-01020-7
|View full text |Cite
|
Sign up to set email alerts
|

Systematic misestimation of machine learning performance in neuroimaging studies of depression

Abstract: We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and health… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
63
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 80 publications
(67 citation statements)
references
References 24 publications
4
63
0
Order By: Relevance
“…supplement 1 ) with profiles of performance similar to the latest benchmarks on model complexity in the UKBB [ 72 ]. Moreover, simulations and empirical findings suggest that larger testing sets are more effective at mitigating optimistic performance estimates [ 53 , 73 ]. Together, this provided a pragmatic solution to the inference-prediction dilemma [ 59 , 74 ] given the 2 objectives of the present investigation to obtain reasonably good predictive models while at the same time performing parameter inference of statistical models developed on the left-out data.…”
Section: Methodsmentioning
confidence: 99%
“…supplement 1 ) with profiles of performance similar to the latest benchmarks on model complexity in the UKBB [ 72 ]. Moreover, simulations and empirical findings suggest that larger testing sets are more effective at mitigating optimistic performance estimates [ 53 , 73 ]. Together, this provided a pragmatic solution to the inference-prediction dilemma [ 59 , 74 ] given the 2 objectives of the present investigation to obtain reasonably good predictive models while at the same time performing parameter inference of statistical models developed on the left-out data.…”
Section: Methodsmentioning
confidence: 99%
“…In this study, we used a large single-site dataset to build SVMs to classify patients with schizophrenia and healthy controls based on brain-wide FC, with an accuracy of 85%. In contrast to recent concerns about the biased estimations of classification performance in studies with small samples [ 23 ], the present results may provide a robust estimation of SVMs for automatic diagnosis of patients with schizophrenia based on brain-wise FCs. On the basis of our data, we recommend AAL-3 for the calculation of brain-wide FC because it yielded higher classification accuracy than AAL-2 and Shen’s 268.…”
Section: Discussionmentioning
confidence: 57%
“…We also investigated how the usage of a different validation approach (leave-one-out cross-validation) would influence the performance estimates of the best performing network: we observed an accuracy of 69.42%, AUC of 0.77, sensitivity of 80.95%, and specificity of 57.89%. However, given that there are theoretical and empirical reasons for why leave-one-out cross-validation is not recommended – especially in the case of small sample sizes – the reported results from this validation scheme should not be the focus of this study ( Flint et al, 2021 , Poldrack et al, 2019 ). To illustrate that there is a significant advantage in utilizing multivariate instead of univariate models we performed an experiment in which we selected the best separating voxel (according to a t -test performed on the training set) of the above network and trained and tested our linear SVM models using only this one voxel.…”
Section: Resultsmentioning
confidence: 99%