Objectives
To assess and improve the assistive role of a deep, densely connected convolutional neural network (CNN) to hematopathologists in differentiating histologic images of Burkitt lymphoma (BL) from diffuse large B-cell lymphoma (DLBCL).
Methods
A total of 10,818 images from BL (n = 34) and DLBCL (n = 36) cases were used to either train or apply different CNNs. Networks differed by number of training images and pixels of images, absence of color, pixel and staining augmentation, and depth of the network, among other parameters.
Results
Cases classified correctly were 17 of 18 (94%), nine with 100% of images correct by the best performing network showing a receiver operating characteristic curve analysis area under the curve 0.92 for both DLBCL and BL. The best performing CNN used all available training images, two random subcrops per image of 448 × 448 pixels, random H&E staining image augmentation, random horizontal flipping of images, random alteration of contrast, reduction on validation error plateau of 15 epochs, block size of six, batch size of 32, and depth of 22. Other networks and decreasing training images had poorer performance.
Conclusions
CNNs are promising augmented human intelligence tools for differentiating a subset of BL and DLBCL cases.
Introduction
Our objective is to improve the assistive role of a deep, densely connected convolutional neural network (CNN) to hematopathologists in differentiating histologic images of Burkitt lymphoma (BL) from diffuse large B-cell lymphoma (DLBCL). We hypothesized that for the majority of cases, a CNN would accurately differentiate BL from DLBCL and attempted to identify design and input variables to improve performance.
Methods and Materials
In total, 3608 images of BL were collected from 18 cases and 2,071 images of DLBCL were collected from 20 cases using Aperio Image Scope (Leica Biosystems, Buffalo Grove, IL). Cases were randomized into training and unknowns, and three separate CNNs were trained and applied to unknown images. Networks differed by either the size in pixels of the images (network 1 used 32 × 32 while networks 2 and 3 used 224 × 224) or absence of color (in network 2). Decreased numbers of training images were used to evaluate network 3.
Results
Network 3 performed the best with 17 of 19 (89%) cases classified correctly, 10 with 100% of images correct. Overall, network 3 had 88% accuracy, 80% precision, 81% recall, and an F1 score of 0.81. Receiver operating characteristic (ROC) curve analysis showed an area under the curve (AUC) of 0.89 for DLBCL and 0.88 for BL. Networks 1 and 2 performed more poorly with F1 scores of 0.63 and 0.39, respectively. Decreasing training images by one-half and one-fourth resulted in decreased F1 scores of 0.63 and 0.60, respectively.
Conclusions
CNNs are promising augmented human intelligence tools for differentiating DLBCL from BL cases and potentially could contribute to improved workflow in the laboratory. More training images, larger image size, and color significantly improved performance of the CNN because of the added information available to the network. More research is under way to optimize the performance of CNNs’ lymphoma application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.