Background Artificial intelligence (AI) research is highly dependent on the nature of the data available. With the steady increase of AI applications in the medical field, the demand for quality medical data is increasing significantly. We here describe the development of a platform for providing and sharing digital pathology data to AI researchers, and highlight challenges to overcome in operating a sustainable platform in conjunction with pathologists. Methods Over 3000 pathological slides from five organs (liver, colon, prostate, pancreas and biliary tract, and kidney) in histologically confirmed tumor cases by pathology departments at three hospitals were selected for the dataset. After digitalizing the slides, tumor areas were annotated and overlaid onto the images by pathologists as the ground truth for AI training. To reduce the pathologists’ workload, AI-assisted annotation was established in collaboration with university AI teams. Results A web-based data sharing platform was developed to share massive pathological image data in 2019. This platform includes 3100 images, and 5 pre-processing algorithms for AI researchers to easily load images into their learning models. Discussion Due to different regulations among countries for privacy protection, when releasing internationally shared learning platforms, it is considered to be most prudent to obtain consent from patients during data acquisition. Conclusions Despite limitations encountered during platform development and model training, the present medical image sharing platform can steadily fulfill the high demand of AI developers for quality data. This study is expected to help other researchers intending to generate similar platforms that are more effective and accessible in the future.
BACKGROUND High-quality learning materials are needed for artificial intelligence (AI) development, but are not practically available; this situation is especially poor in the medical field. In particular, annotating medical images (e.g., annotation for tumor area by pathologists) is massive as well as expensive, and subject to privacy protection. These are major limitations for AI developers to approach and reproduce medical image data. OBJECTIVE This study aimed to reduce barriers for AI researchers to access medical image datasets by collating and sharing high-quality medical images with pathologists, and to find applicable ways to apply diagnostic AI assistance to reduce the pathologists’ workload. METHODS Pathology slides of tumors of five organs (liver, colon, prostate, pancreas and biliary tract, and kidney) from histologically confirmed cases were selected for this study. After scanning the slides to obtain whole slide digital images, the patient information was de-identified, and annotation for the tumor area was performed by the pathologist. Next, an AI-assisted annotation process was used in parallel to improve the annotation workload of pathologists and to draw complex lesion boundaries more accurately. This allowed all the data to include the annotations confirmed by experienced pathologists, and to be used as an AI learning dataset. RESULTS A web-based data-sharing platform for AI learning was built, and was unveiled in 2019. In total, 3,100 massive datasets of 5 organ carcinomas were shared through this platform, and were accessible to all researchers. The platform had the advantage that users could search data visually and intuitively; except for commercial purposes, all researchers made free use of the provided dataset for their research. Finally, the platform also provided five image data pre-processing algorithms that could help AI modeling learners. CONCLUSIONS We built and operated a web-based data-sharing platform for AI researchers providing a high-quality digital pathology dataset personally annotated by pathologists. We hope that our experience will help researchers who want to build such a platform in future, by sharing issues gained from collecting and sharing these valuable data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.