2021
DOI: 10.31219/osf.io/ptuwe
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Connectome-based machine learning models are vulnerable to subtle data manipulations

Abstract: Functional connectome-based predictive models continue to grow in popularity and predictive performance. As these models become more widely used, researchers have begun to question the idea of bias in the models, which is a crucial component of ethics in artificial intelligence. However, we show that model trustworthiness is a more important but vastly overlooked component of the ethics of functional connectome-based predictive models. In this work, we define “trust” as robustness to adversarial attacks, or da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 73 publications
0
8
0
Order By: Relevance
“…We have publicly published a MATLAB toolbox for stability selection that can be used to replicate the approach that we have taken here (i.e., elastic net), as well as to implement 14 other classification and regression algorithms that leverage MATLAB’s machine learning toolbox. We additionally see this package as an opportunity to highlight to researchers the dangers of data leakage that have become problematically common in neuroimaging studies (Poulin et al, 2019; Eitel et al, 2021; Kambeitz et al, 2015; Pulini et al, 2019; Whelan & Garavan, 2014; Mateos-Perez et al, 2018; Yagis et al, 2021; Kapoor & Narayanan, 2022; Rosenblatt et al, 2023; 2023; Poldrack et al, 2019), and package our toolbox with a variety of tutorials for implementing appropriate cross-validation with feature selection. We anticipate this package will remain useful even as clinical neuroimaging datasets grow in size as it is well-established that even in datasets with a limited number of features or more equal ratio of features to samples, removing redundant or noisy features can improve model estimation and performance by reducing the amount of noise that is available for the model to overfit (e.g., Bzdok et al, 2018; Hawkins, 2004; Heinze et al, 2018).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We have publicly published a MATLAB toolbox for stability selection that can be used to replicate the approach that we have taken here (i.e., elastic net), as well as to implement 14 other classification and regression algorithms that leverage MATLAB’s machine learning toolbox. We additionally see this package as an opportunity to highlight to researchers the dangers of data leakage that have become problematically common in neuroimaging studies (Poulin et al, 2019; Eitel et al, 2021; Kambeitz et al, 2015; Pulini et al, 2019; Whelan & Garavan, 2014; Mateos-Perez et al, 2018; Yagis et al, 2021; Kapoor & Narayanan, 2022; Rosenblatt et al, 2023; 2023; Poldrack et al, 2019), and package our toolbox with a variety of tutorials for implementing appropriate cross-validation with feature selection. We anticipate this package will remain useful even as clinical neuroimaging datasets grow in size as it is well-established that even in datasets with a limited number of features or more equal ratio of features to samples, removing redundant or noisy features can improve model estimation and performance by reducing the amount of noise that is available for the model to overfit (e.g., Bzdok et al, 2018; Hawkins, 2004; Heinze et al, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…Similar forms of data leakage that are highly common present when the entire dataset is used to select significant features that will be modeled during cross-validation, or when some intercorrelated features are initially removed to reduce redundancy. While this leakage does not necessarily invalidate the findings that are reported, it often severely inflates model performance, injecting a large optimistic bias into model evaluations, and contributing to the ongoing reproducibility crisis (Kapoor & Narayanan, 2022; Rosenblatt et al, 2023)…”
Section: Introductionmentioning
confidence: 99%
“…As in previous work, the four resting-state scans were averaged to ensure that phase encoding differences (AP/PA) did not result in spatial differences in connectivity (Greene et al 2018; Barron et al 2021; Rosenblatt et al 2021). In addition, combining data across scans has been shown to boost reliability of functional connections (Noble et al 2017).…”
Section: Methodsmentioning
confidence: 99%
“…Enhancement and adversarial attacks can threaten the trustworthiness of neuroimaging-based predictive models. Enhancement attacks are those where purposeful data alterations can lead to falsely enhanced model performance, while adversarial attacks are those where specifically designed noise are added to the data to cause a model to fail 88 . An artificially enhanced model may be the result of scientific malpractice or fraud which, if not discovered, could lead to large amount of time and resources wasted in the wrong research direction.…”
Section: Enhancement and Adversarial Attacksmentioning
confidence: 99%
“…Such manipulations can be detected if data characteristics and exclusion criteria are reported faithfully, especially when outliers are excluded based on a threshold. A more advanced approach involves adding patterns correlated to the behavioral variable of interest to the imaging features, boosting the prediction accuracies to almost perfect accuracies without causing the features to become significantly different from the original features 88 . Furthermore, it is possible to design data enhancements to cause machine learning models to learn brain-behavior relationships not existing in the original data.…”
Section: Enhancement and Adversarial Attacksmentioning
confidence: 99%