Deep learning models applied to healthcare applications including digital pathology have been increasing their scope and importance in recent years. Many of these models have been trained on The Cancer Genome Atlas (TCGA) atlas of digital images, or use it as a validation source. This study shows that there are tissue source site (tss) specific patterns of TCGA images that could be used to identify contributing institutions without any explicit training. Furthermore, it was observed that a model trained for cancer subtype classification has discovered such tss specific patterns within digital slides to classify cancer types. Digital scanner configuration and noise, tissue stain variation and artifacts, and source site patient demographics are among factors that likely account for the observed bias. Therefore, researchers should be cautious of such bias when using histopathology datasets for developing and training deep networks.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Fast diagnosis and treatment of pneumothorax, a collapsed or dropped lung, is crucial to avoid fatalities. Pneumothorax is typically detected on a chest X-ray image through visual inspection by experienced radiologists. However, the detection rate is quite low due to the complexity of visual inspection for small lung collapses. Therefore, there is an urgent need for automated detection systems to assist radiologists. Although deep learning classifiers generally deliver high accuracy levels in many applications, they may not be useful in clinical practice due to the lack of high-quality and representative labeled image sets. Alternatively, searching in the archive of past cases to find matching images may serve as a “virtual second opinion” through accessing the metadata of matched evidently diagnosed cases. To use image search as a triaging or diagnosis assistant, we must first tag all chest X-ray images with expressive identifiers, i.e., deep features. Then, given a query chest X-ray image, the majority vote among the top k retrieved images can provide a more explainable output. In this study, we searched in a repository with more than 550,000 chest X-ray images. We developed the Autoencoding Thorax Net (short AutoThorax -Net) for image search in chest radiographs. Experimental results show that image search based on AutoThorax -Net features can achieve high identification performance providing a path towards real-world deployment. We achieved 92% AUC accuracy for a semi-automated search in 194,608 images (pneumothorax and normal) and 82% AUC accuracy for fully automated search in 551,383 images (normal, pneumothorax and many other chest diseases).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.