2023
DOI: 10.1007/s00521-023-09197-2
|View full text |Cite
|
Sign up to set email alerts
|

Addressing the data bottleneck in medical deep learning models using a human-in-the-loop machine learning approach

Eduardo Mosqueira-Rey,
Elena Hernández-Pereira,
José Bobes-Bascarán
et al.

Abstract: Any machine learning (ML) model is highly dependent on the data it uses for learning, and this is even more important in the case of deep learning models. The problem is a data bottleneck, i.e. the difficulty in obtaining an adequate number of cases and quality data. Another issue is improving the learning process, which can be done by actively introducing experts into the learning loop, in what is known as human-in-the-loop (HITL) ML. We describe an ML model based on a neural network in which HITL techniques … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 69 publications
0
4
0
Order By: Relevance
“…A human-in-the-loop approach, combining a CTGAN for data augmentation and an active learning module for addressing data bottlenecks in medical deep learning models, has been proposed in [36]. The effectiveness of artificial data in active learning scenarios has also been studied in [37], by using G-SMOTE as an artificial data generator and introducing it into the traditional active learning framework in order to reduce the amount of labeled data required in active learning.…”
Section: Active Learning + Data Augmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…A human-in-the-loop approach, combining a CTGAN for data augmentation and an active learning module for addressing data bottlenecks in medical deep learning models, has been proposed in [36]. The effectiveness of artificial data in active learning scenarios has also been studied in [37], by using G-SMOTE as an artificial data generator and introducing it into the traditional active learning framework in order to reduce the amount of labeled data required in active learning.…”
Section: Active Learning + Data Augmentationmentioning
confidence: 99%
“…We chose a pool-based strategy due to the popularity of this approach. Since many different query strategies exist and the selection is not straightforward, we decided to use entropy sampling, as it is a well-known strategy used in typical active learning scenarios [36,41].…”
Section: Active Learning Setupmentioning
confidence: 99%
“…Our goal in including human experts in the feature selection process is to improve the explanatory power of the resulting models. As described in [46], they could also be involved with the aim of obtaining models with a higher accuracy.…”
Section: Feature Selectionmentioning
confidence: 99%
“…First of all, we can say that the dataset has few cases, so ML models suffer when trying to generalize patterns present in the data. This is a clear data bottleneck problem and, as discussed in [46], a possible solution is to use data augmentation strategies to improve data quality and quantity. In that work the accuracy increased more than 10 percent with the collaboration of human experts who helped to improve the labeling and the generation of synthetic cases.…”
Section: Performancementioning
confidence: 99%