2020
DOI: 10.48550/arxiv.2002.05049
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Detect and Correct Bias in Multi-Site Neuroimaging Datasets

Christian Wachinger,
Anna Rieckmann,
Sebastian Pölsterl

Abstract: The desire to train complex machine learning algorithms and to increase the statistical power in association studies drives neuroimaging research to use ever-larger datasets. The most obvious way to increase sample size is by pooling scans from independent studies. However, simple pooling is often ill-advised as selection, measurement, and confounding biases may creep in and yield spurious correlations. In this work, we combine 35,320 magnetic resonance images of the brain from 17 studies to examine bias in ne… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 39 publications
0
2
0
Order By: Relevance
“…To perform classification on such multi-site data, the source of the data (dataset label) becomes the metadata, which is parameterized here by onehot encodings. Medical imaging datasets acquired in multiple sites with different scanning protocols is a core challenge for machine learning algorithms in medicine, [38,56], as different scanning protocols lead to different image formations. Differing class formations across sites (as in this experiment) creates a simple undesirable cue for the model to leverage during prediction as a confounder.…”
Section: Classification Of Multi-site Medical Datamentioning
confidence: 99%
“…To perform classification on such multi-site data, the source of the data (dataset label) becomes the metadata, which is parameterized here by onehot encodings. Medical imaging datasets acquired in multiple sites with different scanning protocols is a core challenge for machine learning algorithms in medicine, [38,56], as different scanning protocols lead to different image formations. Differing class formations across sites (as in this experiment) creates a simple undesirable cue for the model to leverage during prediction as a confounder.…”
Section: Classification Of Multi-site Medical Datamentioning
confidence: 99%
“…We performed our subtyping analyses on 657 autistic patients from the Autism Brain Imaging Data Exchange (ABIDE) repository 48 , all of them having passed a very strict quality-assurance criterion of elimination of subjects by head movement during image acquisition, thus correcting a well-known spurious excess of functional connectivity driven by head movements, which is even more pronounced in the autistic condition. Moreover, to overcome inter-scanner variability in the functional connectivity values across different Institutions, we applied rigorous harmonization strategies to transform data that are heterogeneous --and that come from different Institutions--into equivalents [49][50][51][52] .…”
Section: Introductionmentioning
confidence: 99%