High annotation costs are a substantial bottleneck in applying deep learning architectures to clinically relevant use cases, substantiating the need for algorithms to learn from unlabeled data. In this work, we propose employing self-supervised methods. To that end, we trained with three self-supervised algorithms on a large corpus of unlabeled dental images, which contained 38K bitewing radiographs (BWRs). We then applied the learned neural network representations on tooth-level dental caries classification, for which we utilized labels extracted from electronic health records (EHRs). Finally, a holdout test-set was established, which consisted of 343 BWRs and was annotated by three dental professionals and approved by a senior dentist. This test-set was used to evaluate the fine-tuned caries classification models. Our experimental results demonstrate the obtained gains by pretraining models using self-supervised algorithms. These include improved caries classification performance (6 p.p. increase in sensitivity) and, most importantly, improved label-efficiency. In other words, the resulting models can be fine-tuned using few labels (annotations). Our results show that using as few as 18 annotations can produce ≥45% sensitivity, which is comparable to human-level diagnostic performance. This study shows that self-supervision can provide gains in medical image analysis, particularly when obtaining labels is costly and expensive.
Convolutional Neural Networks (CNNs) such as U-Net have been widely used for medical image segmentation. Dental restorations are prominent features of dental radiographs. Applying U-Net on the panoramic image is challenging, as the shape, size and frequency of different restoration types vary. We hypothesized that models trained on smaller, equally spaced rectangular image crops (tiles) of the panoramic would outperform models trained on the full image. A total of 1781 panoramic radiographs were annotated pixelwise for fillings, crowns, and root canal fillings by dental experts. We used different numbers of tiles for our experiments. Five-times-repeated three-fold cross-validation was used for model evaluation. Training with more tiles improved model performance and accelerated convergence. The F1-score for the full panoramic image was 0.7, compared to 0.83, 0.92 and 0.95 for 6, 10 and 20 tiles, respectively. For root canals fillings, which are small, cone-shaped features that appear less frequently on the radiographs, the performance improvement was even higher (+294%). Training on tiles and pooling the results thereafter improved pixelwise classification performance and reduced the time to model convergence for segmenting dental restorations. Segmentation of panoramic radiographs is biased towards more frequent and extended classes. Tiling may help to overcome this bias and increase accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.