2022
DOI: 10.1007/978-3-031-20837-9_16
|View full text |Cite
|
Sign up to set email alerts
|

The Need of Standardised Metadata to Encode Causal Relationships: Towards Safer Data-Driven Machine Learning Biological Solutions

Abstract: In this paper, we discuss the importance of considering causal relations in the development of machine learning solutions to prevent factors hampering the robustness and generalisation capacity of the models, such as induced biases. This issue often arises when the algorithm decision is affected by confounding factors. In this work, we argue that the integration of causal relationships can identify potential confounders. We call for standardised meta-information practices as a crucial step for proper machine l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…However, "their use draws from the premise that data is a solid representation of the modelled phenomena". Hence, these practices cannot overcome data collection issues [54]. For instance, demographic information such as ethnicity, age, and gender may not be provided to the classifier, but anyway used to ensure that the performance is acceptable across different stratas (combinations of age groups, gender, ethnicity, etc.).…”
Section: Sample Representativenessmentioning
confidence: 99%
See 1 more Smart Citation
“…However, "their use draws from the premise that data is a solid representation of the modelled phenomena". Hence, these practices cannot overcome data collection issues [54]. For instance, demographic information such as ethnicity, age, and gender may not be provided to the classifier, but anyway used to ensure that the performance is acceptable across different stratas (combinations of age groups, gender, ethnicity, etc.).…”
Section: Sample Representativenessmentioning
confidence: 99%
“…Moreover, causal information can enable metacomparison of data acquisition pipelines providing better reproducibility and replication of the solution development from the data acquisition to the algorithm training. In their recent paper, Garcia et al share a series of guidelines for safer data-driven ML solutions through actionable causal information and metadata approaches [54]. Namely, the authors argue that the inclusion of causal information in the data generation process can help prevent confounding effects while metadata information eases dataset auditing and model evaluation.…”
Section: The Importance Of Metadatamentioning
confidence: 99%
“…Several initiatives have already been developed to promote open data sharing in neuroimaging, such as the OpenfMRI (Poldrack et al, 2013 ) and NeuroVault (Gorgolewski et al, 2015 ) repositories. Furthermore, educating researchers about the importance of standardization in neuroimaging research (Laird et al, 2011 ) and providing them with the necessary tools and resources to implement standardized protocols and criteria in their research is crucial, including standardization of the metadata as a way to reflect the causal and anti-causal assumptions made during the data collection and annotation (Garcia Santa Cruz et al, 2022b ). Further, standardization of the annotation pipeline is important to improve the consistency and quality of annotations.…”
Section: Limitations Associated With Clinical Brain Imaging Datasetsmentioning
confidence: 99%
“…These biases can result in systematic and repeatable errors, leading to unfair outcomes that favor certain groups over others, ultimately lowering the accuracy of the recommendation for some patient groups, particularly when there are racial biases. These biases can originate from existing inequality (Ricci Lara et al, 2022 ) or can also stem from selection bias introduced during the acquisition process (Garcia Santa Cruz et al, 2022b ).…”
Section: Limitations Associated With Machine Learning/deep Learningmentioning
confidence: 99%
“…Expanding on this argumentation, (Holzinger et al, 2022) argue that information fusion is key to achieve greater transparency and safety in medical imaging ML applications. In a slightly different position paper, (Santa Cruz et al, 2021) argue for the standardization of medical metadata in order to assist causal inference techniques in biomedical ML. Along similar lines, (Garcea et al, 2021) argues for the use of causal intuition when designing medical imaging datasets.…”
Section: Fairness Safety and Explainabilitymentioning
confidence: 99%