Addressing the data bottleneck in medical deep learning models using a human-in-the-loop machine learning approach

Mosqueira-Rey, Eduardo; Hernández-Pereira, Elena; Bobes-Bascarán, José; Alonso-Ríos, David; Pérez-Sánchez, Alberto; Fernández-Leal, Ángel; Moret-Bonillo, Vicente; Vidal-Ínsua, Yolanda; Vázquez-Rivera, Francisca

doi:10.1007/s00521-023-09197-2

Cited by 6 publications

(4 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A human-in-the-loop approach, combining a CTGAN for data augmentation and an active learning module for addressing data bottlenecks in medical deep learning models, has been proposed in [36]. The effectiveness of artificial data in active learning scenarios has also been studied in [37], by using G-SMOTE as an artificial data generator and introducing it into the traditional active learning framework in order to reduce the amount of labeled data required in active learning.…”

Section: Active Learning + Data Augmentationmentioning

confidence: 99%

See 1 more Smart Citation

Exploring Data Augmentation and Active Learning Benefits in Imbalanced Datasets

Moles,

Andres,

Echegaray

et al. 2024

Mathematics

View full text Add to dashboard Cite

Despite the increasing availability of vast amounts of data, the challenge of acquiring labeled data persists. This issue is particularly serious in supervised learning scenarios, where labeled data are essential for model training. In addition, the rapid growth in data required by cutting-edge technologies such as deep learning makes the task of labeling large datasets impractical. Active learning methods offer a powerful solution by iteratively selecting the most informative unlabeled instances, thereby reducing the amount of labeled data required. However, active learning faces some limitations with imbalanced datasets, where majority class over-representation can bias sample selection. To address this, combining active learning with data augmentation techniques emerges as a promising strategy. Nonetheless, the best way to combine these techniques is not yet clear. Our research addresses this question by analyzing the effectiveness of combining both active learning and data augmentation techniques under different scenarios. Moreover, we focus on improving the generalization capabilities for minority classes, which tend to be overshadowed by the improvement seen in majority classes. For this purpose, we generate synthetic data using multiple data augmentation methods and evaluate the results considering two active learning strategies across three imbalanced datasets. Our study shows that data augmentation enhances prediction accuracy for minority classes, with approaches based on CTGANs obtaining improvements of nearly 50% in some cases. Moreover, we show that combining data augmentation techniques with active learning can reduce the amount of real data required.

show abstract

Section: Active Learning + Data Augmentationmentioning

confidence: 99%

“…We chose a pool-based strategy due to the popularity of this approach. Since many different query strategies exist and the selection is not straightforward, we decided to use entropy sampling, as it is a well-known strategy used in typical active learning scenarios [36,41].…”

Section: Active Learning Setupmentioning

confidence: 99%

Exploring Data Augmentation and Active Learning Benefits in Imbalanced Datasets

Moles,

Andres,

Echegaray

et al. 2024

Mathematics

View full text Add to dashboard Cite

show abstract

“…Our goal in including human experts in the feature selection process is to improve the explanatory power of the resulting models. As described in [46], they could also be involved with the aim of obtaining models with a higher accuracy.…”

Section: Feature Selectionmentioning

confidence: 99%

“…First of all, we can say that the dataset has few cases, so ML models suffer when trying to generalize patterns present in the data. This is a clear data bottleneck problem and, as discussed in [46], a possible solution is to use data augmentation strategies to improve data quality and quantity. In that work the accuracy increased more than 10 percent with the collaboration of human experts who helped to improve the labeling and the generation of synthetic cases.…”

Section: Performancementioning

confidence: 99%

Improving Medical Data Annotation Including Humans in the Machine Learning Loop

Bobes-Bascarán

Mosqueira-Rey

Alonso-Ríos

2021

The 4th XoveTIC Conference

Self Cite

View full text Add to dashboard Cite

At present, the great majority of Artificial Intelligence (AI) systems require the participation of humans in their development, tuning, and maintenance. Particularly, Machine Learning (ML) systems could greatly benefit from their expertise or knowledge. Thus, there is an increasing interest around how humans interact with those systems to obtain the best performance for both the AI system and the humans involved. Several approaches have been studied and proposed in the literature that can be gathered under the umbrella term of Human-in-the-Loop Machine Learning. The application of those techniques to the health informatics environment could provide a great value on prognosis and diagnosis tasks contributing to develop a better health service for Cancer related diseases.

show abstract