Poor generalizability is a major barrier to clinical implementation of artificial intelligence in digital pathology. The aim of this study was to test the generalizability of a pretrained deep learning model to a new diagnostic setting and to a small change in surgical indication. A deep learning model for breast cancer metastases detection in sentinel lymph nodes, trained on CAMELYON multicenter data, was used as a base model, and achieved an AUC of 0.969 (95% CI 0.926–0.998) and FROC of 0.838 (95% CI 0.757–0.913) on CAMELYON16 test data. On local sentinel node data, the base model performance dropped to AUC 0.929 (95% CI 0.800–0.998) and FROC 0.744 (95% CI 0.566–0.912). On data with a change in surgical indication (axillary dissections) the base model performance indicated an even larger drop with a FROC of 0.503 (95%CI 0.201–0.911). The model was retrained with addition of local data, resulting in about a 4% increase for both AUC and FROC for sentinel nodes, and an increase of 11% in AUC and 49% in FROC for axillary nodes. Pathologist qualitative evaluation of the retrained model´s output showed no missed positive slides. False positives, false negatives and one previously undetected micro-metastasis were observed. The study highlights the generalization challenge even when using a multicenter trained model, and that a small change in indication can considerably impact the model´s performance.
The human-in-the-loop: an evaluation of pathologists' interaction with artificial intelligence in clinical practice Aims: One of the major drivers of the adoption of digital pathology in clinical practice is the possibility of introducing digital image analysis (DIA) to assist with diagnostic tasks. This offers potential increases in accuracy, reproducibility, and efficiency. Whereas stand-alone DIA has great potential benefit for research, little is known about the effect of DIA assistance in clinical use. The aim of this study was to investigate the clinical use characteristics of a DIA application for Ki67 proliferation assessment. Specifically, the human-in-the-loop interplay between DIA and pathologists was studied. Methods and results: We retrospectively investigated breast cancer Ki67 areas assessed with human-inthe-loop DIA and compared them with visual and automatic approaches. The results, expressed as standard deviation of the error in the Ki67 index, showed that visual estimation ('eyeballing') (14.9 percentage points) performed significantly worse (P < 0.05) than DIA alone (7.2 percentage points) and DIA with human-in-the-loop corrections (6.9 percentage points). At the overall level, no improvement resulting from the addition of human-in-theloop corrections to the automatic DIA results could be seen. For individual cases, however, human-inthe-loop corrections could address major DIA errors in terms of poor thresholding of faint staining and incorrect tumour-stroma separation.
Conclusion:The findings indicate that the primary value of human-in-the-loop corrections is to address major weaknesses of a DIA application, rather than fine-tuning the DIA quantifications.
Artificial intelligence (AI) holds much promise for enabling highly desired imaging diagnostics improvements. One of the most limiting bottlenecks for the development of useful clinical-grade AI models is the lack of training data. One aspect is the large amount of cases needed and another is the necessity of high-quality ground truth annotation. The aim of the project was to establish and describe the construction of a database with substantial amounts of detail-annotated oncology imaging data from pathology and radiology. A specific objective was to be proactive, that is, to support undefined subsequent AI training across a wide range of tasks, such as detection, quantification, segmentation, and classification, which puts particular focus on the quality and generality of the annotations. The main outcome of this project was the database as such, with a collection of labeled image data from breast, ovary, skin, colon, skeleton, and liver. In addition, this effort also served as an exploration of best practices for further scalability of high-quality image collections, and a main contribution of the study was generic lessons learned regarding how to successfully organize efforts to construct medical imaging databases for AI training, summarized as eight guiding principles covering team, process, and execution aspects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.