2021
DOI: 10.1016/j.neuroimage.2021.118044
|View full text |Cite
|
Sign up to set email alerts
|

Resample aggregating improves the generalizability of connectome predictive modeling

Abstract: It is a longstanding goal of neuroimaging to produce reliable, generalizable models of brain behavior relationships. More recently, data driven predictive models have become popular. However, overfitting is a common problem with statistical models, which impedes model generalization. Cross validation (CV) is often used to estimate expected model performance within sample. Yet, the best way to generate brain behavior models, and apply them out-of-sample, on an unseen dataset, is unclear. As a solution, this stu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 73 publications
0
13
0
Order By: Relevance
“…Cross-ethnicity/race biases were not investigated for cross-dataset prediction considering the length of this article. However, it should be acknowledged that the generalizability of behavioral prediction models across datasets is a crucial research topic and is still under intensive investigation ( 20 , 45 ). How predictive models trained in one dataset could generalize to multiple ethnic/racial groups in another dataset should be examined in the future.…”
Section: Discussionmentioning
confidence: 99%
“…Cross-ethnicity/race biases were not investigated for cross-dataset prediction considering the length of this article. However, it should be acknowledged that the generalizability of behavioral prediction models across datasets is a crucial research topic and is still under intensive investigation ( 20 , 45 ). How predictive models trained in one dataset could generalize to multiple ethnic/racial groups in another dataset should be examined in the future.…”
Section: Discussionmentioning
confidence: 99%
“…Because the HCP 7T dataset is composed of data from individuals of varying degrees of genetic relatedness (monozygotic and dizygotic twins, non-twin siblings, and un-related individuals; 93 unique families), all individuals from the same family were randomly assigned to one of two groups of 88 (i.e., split-half cross-validation), with one group being used to train a model that would then be tested on the other (and vice versa). The following approach was then applied to 100 of these random splits of the data to assess the performance of rCPM across different training/testing sets and to build a bagged model that is more robust to overfitting ( O’Connor et al, 2021 ).…”
Section: Methodsmentioning
confidence: 99%
“…It could then be the case that the CV models rely on stimulus-specific signals and may fail to predict gISC during viewing of a different movie. A bootstrap aggregating, or “bagging,” approach was used to test whether the 200 linear models trained on Day 1 movie watching and resting state data could predict Day 2 gISC (derived from a different set of stimuli) from Day 2 RSFC, as previous work has shown bagged CPM models to be more accurate and more generalizable than their non-bagged counterparts ( O’Connor et al., 2021 ). To construct the bagged model, RSFC edges that passed the P < .01 feature selection step in at least 10% (20/200, reflecting the 100 iterations of split-half cross-validation) of iterations were identified, yielding 1437 edges total.…”
Section: Methodsmentioning
confidence: 99%
“…On the held-out set, unique subject-wise predictions were obtained by averaging across folds and occasional duplicate predictions due to Monte Carlo sampling, which could produce multiple predictions per participant (we ensured prior to computation that with 100 CV-splits, predictions were available for all participants). Such a strategy is known as CV-bagging [ 105 , 106 ] and can improve both performance and stability of results (the use of CV-bagging can explain why in Figs 3 and 4 and Fig. 3 - Figure supplement 1 the performance was sometimes slightly better on the held-out set compared to the cross-validation on the validation test).…”
Section: Methodsmentioning
confidence: 99%