Connectome-based machine learning models are vulnerable to subtle data manipulations

Rosenblatt, Matthew; Rodriguez, Raimundo X.; Westwater, Margaret L.; Horien, Corey; Greene, Abigail S.; Constable, Robert T.; Noble, Stephanie; Scheinost, Dustin

doi:10.31219/osf.io/ptuwe

Cited by 5 publications

(8 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have publicly published a MATLAB toolbox for stability selection that can be used to replicate the approach that we have taken here (i.e., elastic net), as well as to implement 14 other classification and regression algorithms that leverage MATLAB’s machine learning toolbox. We additionally see this package as an opportunity to highlight to researchers the dangers of data leakage that have become problematically common in neuroimaging studies (Poulin et al, 2019; Eitel et al, 2021; Kambeitz et al, 2015; Pulini et al, 2019; Whelan & Garavan, 2014; Mateos-Perez et al, 2018; Yagis et al, 2021; Kapoor & Narayanan, 2022; Rosenblatt et al, 2023; 2023; Poldrack et al, 2019), and package our toolbox with a variety of tutorials for implementing appropriate cross-validation with feature selection. We anticipate this package will remain useful even as clinical neuroimaging datasets grow in size as it is well-established that even in datasets with a limited number of features or more equal ratio of features to samples, removing redundant or noisy features can improve model estimation and performance by reducing the amount of noise that is available for the model to overfit (e.g., Bzdok et al, 2018; Hawkins, 2004; Heinze et al, 2018).…”

Section: Discussionmentioning

confidence: 99%

“…Similar forms of data leakage that are highly common present when the entire dataset is used to select significant features that will be modeled during cross-validation, or when some intercorrelated features are initially removed to reduce redundancy. While this leakage does not necessarily invalidate the findings that are reported, it often severely inflates model performance, injecting a large optimistic bias into model evaluations, and contributing to the ongoing reproducibility crisis (Kapoor & Narayanan, 2022; Rosenblatt et al, 2023)…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Excellence is a habit: Enhancing predictions of language impairment by identifying stable features in clinical perfusion scans

Teghipco,

Kim,

Rorden

et al. 2023

Preprint

View full text Add to dashboard Cite

Perfusion images guide acute stroke management, yet few studies have been able to systematically investigate CT perfusion collected during routine care because the measures are stored in proprietary formats incompatible with conventional research analysis pipelines. We illustrate the potential of harnessing granular data from these routine scans by using them to identify the association between specific areas of hypoperfusion and severity of object naming impairment in 43 acute stroke patients. Traditionally, similar analyses in such sample sizes face a dilemma:simple models risk being too constrained to make accurate predictions, while complex models risk overfitting and producing poor out-of-sample predictions. We demonstrate that evaluating the stability rather than out-of-sample predictive capacity of features in a nested cross-validation scheme can be an effective way of controlling model complexity and stabilizing model estimates across a variety of different regression techniques. Specifically, we show that introducing this step can determine model significance, even when the regression model already contains an embedded feature selection or dimensionality reduction step, or if a subset of features is manually selected prior to training based on expert knowledge. After improving model performance using more complex regression techniques, we discover that object naming performance relies on an extended language network encompassing regions thought to play a larger role in different naming tasks, right hemisphere regions distal to the site of injury, and regions and tracts that are less typically associated with language function. Our findings especially emphasize the role of the left superior temporal gyrus, uncinate fasciculus, and posterior insula in successful prediction of object naming impairment. Collectively, these results highlight the untapped potential of clinical CT perfusion images and demonstrate a flexible framework for enabling prediction in the limited sample sizes that currently dominate clinical neuroimaging.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Excellence is a habit: Enhancing predictions of language impairment by identifying stable features in clinical perfusion scans

Teghipco,

Kim,

Rorden

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…As in previous work, the four resting-state scans were averaged to ensure that phase encoding differences (AP/PA) did not result in spatial differences in connectivity (Greene et al 2018; Barron et al 2021; Rosenblatt et al 2021). In addition, combining data across scans has been shown to boost reliability of functional connections (Noble et al 2017).…”

Section: Methodsmentioning

confidence: 99%

Sex differences in default mode network connectivity in healthy aging adults

Ficek-Tani¹,

Horien

Ju³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Women show an increased lifetime risk of Alzheimer's disease (AD) compared to men. Characteristic brain connectivity changes, particularly within the default mode network (DMN), have been associated with both symptomatic and preclinical AD, but the impact of sex on DMN function throughout aging is poorly understood. We investigated sex differences in DMN connectivity over the lifespan in 595 cognitively healthy participants from the Human Connectome Project - Aging cohort. We used the intrinsic connectivity distribution (a robust voxel-based metric of functional connectivity) and a seed connectivity approach to determine sex differences within the DMN and between the DMN and whole brain. Compared with men, women demonstrated increased connectivity with age in posterior DMN nodes and decreased connectivity in the medial prefrontal cortex. Differences were most prominent in the decades surrounding menopause. Seed-based analysis revealed increased connectivity in women from the posterior cingulate to angular gyrus and parahippocampal gyrus, which correlated with neuropsychological measures of declarative memory. Taken together, we show significant sex differences in DMN subnetworks over the lifespan, including patterns in aging women that resemble changes previously seen in preclinical AD. These findings highlight the importance of considering sex in neuroimaging studies of aging and neurodegeneration.

show abstract

“…Enhancement and adversarial attacks can threaten the trustworthiness of neuroimaging-based predictive models. Enhancement attacks are those where purposeful data alterations can lead to falsely enhanced model performance, while adversarial attacks are those where specifically designed noise are added to the data to cause a model to fail 88 . An artificially enhanced model may be the result of scientific malpractice or fraud which, if not discovered, could lead to large amount of time and resources wasted in the wrong research direction.…”

Section: Enhancement and Adversarial Attacksmentioning

confidence: 99%

“…Such manipulations can be detected if data characteristics and exclusion criteria are reported faithfully, especially when outliers are excluded based on a threshold. A more advanced approach involves adding patterns correlated to the behavioral variable of interest to the imaging features, boosting the prediction accuracies to almost perfect accuracies without causing the features to become significantly different from the original features 88 . Furthermore, it is possible to design data enhancements to cause machine learning models to learn brain-behavior relationships not existing in the original data.…”

Section: Enhancement and Adversarial Attacksmentioning

confidence: 99%

The challenges and prospects of brain-based prediction of behaviour

Wu,

Li,

Eickhoff

et al. 2023

Nat Hum Behav

View full text Add to dashboard Cite

Relating individual brain patterns to behavior is fundamental in system neuroscience. Recently, the predictive modeling approach has become increasingly popular, largely due to the recent availability of large open datasets and access to computational resources. This means that we can use machine learning models, and interindividual differences at the brain level represented by neuroimaging features to predict interindividual differences in behavioral measures. By doing so, we could identify biomarkers and neural correlates in a data-driven fashion. Nevertheless, this budding field of neuroimaging-based predictive modelling is facing issues that may limit its potential applications. Here, we review these existing challenges, as well as those that we anticipate as the field develops. We focus on the impact of these challenges on brain-based predictions. We suggest potential solutions to address the resolvable challenges, while keeping in mind that some general and conceptual limitations may also underlie the predictive modeling approach.

show abstract

Connectome-based machine learning models are vulnerable to subtle data manipulations

Cited by 5 publications

References 73 publications

Excellence is a habit: Enhancing predictions of language impairment by identifying stable features in clinical perfusion scans

Excellence is a habit: Enhancing predictions of language impairment by identifying stable features in clinical perfusion scans

Sex differences in default mode network connectivity in healthy aging adults

The challenges and prospects of brain-based prediction of behaviour

Contact Info

Product

Resources

About