Machine Learning for Health: Algorithm Auditing &amp; Quality Control

Oala, Luis; Murchison, Andrew G.; Balachandran, Pradeep; Choudhary, Shruti; Fehr, Jana; Leite, Alixandro Werneck; Goldschmidt, P; Johner, Christian; Schörverth, Elora D M; Nakasi, Rose; Meyer, Martin; Cabitza, Federico; Baird, Pat; Prabhu, Carolin; Weicken, Eva; Liu, Xiaoxuan; Wenzel, Markus; Vogler, Steffen; Akogo, Darlington Ahiale; Alsalamah, Shada; Kazim, Emre; Koshiyama, Adriano; Piechottka, Sven; MacPherson, S.; Shadforth, Ian; Geierhofer, Regina; Matek, Christian; Krois, Joachim; Sanguinetti, Bruno; Arentz, Matthew; Bielik, Pavol; Calderón-Ramírez, Saúl; Abbood, Auss; Langer, Nicolas; Haufe, Stefan; Kherif, Ferath; Pujari, Sameer; Samek, Wojciech; Wiegand, Thomas

doi:10.1007/s10916-021-01783-y

Cited by 33 publications

(24 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Feature space-based quality metrics can be explored in more recent deep learning architectures such as transformers [39]. Additionally further evaluation of model-oriented properties of deep learning models such as robustness and predictive uncertainty, as recommended in [45], is also a future workline to develop.…”

Section: Discussionmentioning

confidence: 99%

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica

Calderón-Ramírez

Murillo-Hernandez

Rojas-Salazar

et al. 2022

Med Biol Eng Comput

Self Cite

View full text Add to dashboard Cite

The implementation of deep learning-based computer-aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labelled images, which can be expensive to obtain as time and effort from clinical practitioners are required. To address this, a number of publicly available datasets have been built with data from different hospitals and clinics, which can be used to pre-train the model. However, using models trained on these datasets for later transfer learning and model fine-tuning with images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. In this work, a real-world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. A common approach to further improve the model's performance under such small labelled target dataset setting is data augmentation. However, often cheaper unlabelled data is available from the target clinic. Therefore, semi-supervised deep learning, which leverages both labelled and unlabelled data, can be used in such conditions. In this work, we evaluate the semi-supervised deep learning approach known as MixMatch, to take advantage of unlabelled data from the target dataset, for whole mammogram image classification. We compare the usage of semi-supervised learning on its own, and combined with transfer learning (from a source mammogram dataset) with data augmentation, as also against regular supervised learning with transfer learning and data augmentation from source datasets. It is shown that the use of a semi-supervised deep learning combined with transfer learning and data augmentation can provide a meaningful advantage when using scarce labelled observations. Also, we found a strong influence of the source dataset, which suggests a more data-centric approach needed to tackle the challenge of scarcely labelled data. We used several different metrics to assess the performance gain of using semi-supervised learning, when dealing with very imbalanced test datasets (such as the G-mean and the F2-score), as mammogram datasets are often very imbalanced.

show abstract

Section: Discussionmentioning

confidence: 99%

A real use case of semi-supervised learning for mammogram classification in a local clinic of Costa Rica

Calderón-Ramírez

Murillo-Hernandez

Rojas-Salazar

et al. 2022

Med Biol Eng Comput

Self Cite

View full text Add to dashboard Cite

show abstract

“…ISO15189 could provide inspiration for this. It is of great importance that the user has the appropriate expertise to audit ( 24 ) and validate AI/ML-CDS tools or else a situation can arise where underperforming and potentially harmful use of AI/ML in clinical practice is not being identified ( 25 ). In case departments of a healthcare institution are unable to provide this expertise themselves, it could be bundled in a centralized AI laboratory.…”

Section: Discussionmentioning

confidence: 99%

A Perspective on a Quality Management System for AI/ML-Based Clinical Decision Support in Hospital Care

Bartels

Dudink

Haitjema

et al. 2022

Front. Digit. Health

View full text Add to dashboard Cite

Although many artificial intelligence (AI) and machine learning (ML) based algorithms are being developed by researchers, only a small fraction has been implemented in clinical-decision support (CDS) systems for clinical care. Healthcare organizations experience significant barriers implementing AI/ML models for diagnostic, prognostic, and monitoring purposes. In this perspective, we delve into the numerous and diverse quality control measures and responsibilities that emerge when moving from AI/ML-model development in a research environment to deployment in clinical care. The Sleep-Well Baby project, a ML-based monitoring system, currently being tested at the neonatal intensive care unit of the University Medical Center Utrecht, serves as a use-case illustrating our personal learning journey in this field. We argue that, in addition to quality assurance measures taken by the manufacturer, user responsibilities should be embedded in a quality management system (QMS) that is focused on life-cycle management of AI/ML-CDS models in a medical routine care environment. Furthermore, we highlight the strong similarities between AI/ML-CDS models and in vitro diagnostic devices and propose to use ISO15189, the quality guideline for medical laboratories, as inspiration when building a QMS for AI/ML-CDS usage in the clinic. We finally envision a future in which healthcare institutions run or have access to a medical AI-lab that provides the necessary expertise and quality assurance for AI/ML-CDS implementation and applies a QMS that mimics the ISO15189 used in medical laboratories.

show abstract

“…This is for two reasons: first, these AI innovations by themselves do not re-engineer the incentives that govern existing ways of working. A complex web of ingrained political and economic factors as well as the proximal influence of medical practice norms and commercial interests determine the way healthcare is delivered ( 16 ). Regulations and guidelines currently in use are not sufficient for AI methods to be reported in such detail that they can be reproduced and safely implemented in clinical practice for classification or prediction in new patients ( 17 ).…”

Section: The Chaos Of Humans and Healthcarementioning

confidence: 99%