Background
One of the most challenging tasks for bladder cancer diagnosis is to histologically differentiate two early stages, non-invasive Ta and superficially invasive T1, the latter of which is associated with a significantly higher risk of disease progression. Indeed, in a considerable number of cases, Ta and T1 tumors look very similar under microscope, making the distinction very difficult even for experienced pathologists. Thus, there is an urgent need for a favoring system based on machine learning (ML) to distinguish between the two stages of bladder cancer.
Methods
A total of 1177 images of bladder tumor tissues stained by hematoxylin and eosin were collected by pathologists at University of Rochester Medical Center, which included 460 non-invasive (stage Ta) and 717 invasive (stage T1) tumors. Automatic pipelines were developed to extract features for three invasive patterns characteristic to the T1 stage bladder cancer (i.e., desmoplastic reaction, retraction artifact, and abundant pinker cytoplasm), using imaging processing software ImageJ and CellProfiler. Features extracted from the images were analyzed by a suite of machine learning approaches.
Results
We extracted nearly 700 features from the Ta and T1 tumor images. Unsupervised clustering analysis failed to distinguish hematoxylin and eosin images of Ta vs. T1 tumors. With a reduced set of features, we successfully distinguished 1177 Ta or T1 images with an accuracy of 91–96% by six supervised learning methods. By contrast, convolutional neural network (CNN) models that automatically extract features from images produced an accuracy of 84%, indicating that feature extraction driven by domain knowledge outperforms CNN-based automatic feature extraction. Further analysis revealed that desmoplastic reaction was more important than the other two patterns, and the number and size of nuclei of tumor cells were the most predictive features.
Conclusions
We provide a ML-empowered, feature-centered, and interpretable diagnostic system to facilitate the accurate staging of Ta and T1 diseases, which has a potential to apply to other types of cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.