BackgroundThere is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task.Methods and findingsA cross-sectional design with multiple model training cohorts was used to evaluate model generalizability to external sites using split-sample validation. A total of 158,323 chest radiographs were drawn from three institutions: National Institutes of Health Clinical Center (NIH; 112,120 from 30,805 patients), Mount Sinai Hospital (MSH; 42,396 from 12,904 patients), and Indiana University Network for Patient Care (IU; 3,807 from 3,683 patients). These patient populations had an age mean (SD) of 46.9 years (16.6), 63.2 years (16.5), and 49.6 years (17) with a female percentage of 43.5%, 44.8%, and 57.3%, respectively. We assessed individual models using the area under the receiver operating characteristic curve (AUC) for radiographic findings consistent with pneumonia and compared performance on different test sets with DeLong’s test. The prevalence of pneumonia was high enough at MSH (34.2%) relative to NIH and IU (1.2% and 1.0%) that merely sorting by hospital system achieved an AUC of 0.861 (95% CI 0.855–0.866) on the joint MSH–NIH dataset. Models trained on data from either NIH or MSH had equivalent performance on IU (P values 0.580 and 0.273, respectively) and inferior performance on data from each other relative to an internal test set (i.e., new data from within the hospital system used for training data; P values both <0.001). The highest internal performance was achieved by combining training and test data from MSH and NIH (AUC 0.931, 95% CI 0.927–0.936), but this model demonstrated significantly lower external performance at IU (AUC 0.815, 95% CI 0.745–0.885, P = 0.001). To test the effect of pooling data from sites with disparate pneumonia prevalence, we used stratified subsampling to generate MSH–NIH cohorts that only differed in disease prevalence between training data sites. When both training data sites had the same pneumonia prevalence, the model performed consistently on external IU data (P = 0.88). When a 10-fold difference in pneumonia rate was introduced between sites, internal test performance improved compared to the balanced model (10× MSH risk P < 0.001; 10× NIH P = 0.002), but this outperformance failed to generalize to IU (MSH 10× P < 0.001; NIH 10× P = 0.027). CNNs were able to directly detect hospital system of a radiograph for 99.95% NIH (22,050/22,062) and 99.98% MSH (8,386/8,388) radiographs. The primary limitation of our approach and the available public data is that we cannot fully assess what other factors might be contributing to hospital system–specific biases.ConclusionPneumonia-screening CNNs achieved better internal than external performance in 3 out of 5 natural comparisons. When models wer...
Rapid diagnosis and treatment of acute neurological illnesses such as stroke, hemorrhage, and hydrocephalus are critical to achieving positive outcomes and preserving neurologic function-'time is brain'. Although these disorders are often recognizable by their symptoms, the critical means of their diagnosis is rapid imaging. Computer-aided surveillance of acute neurologic events in cranial imaging has the potential to triage radiology workflow, thus decreasing time to treatment and improving outcomes. Substantial clinical work has focused on computer-assisted diagnosis (CAD), whereas technical work in volumetric image analysis has focused primarily on segmentation. 3D convolutional neural networks (3D-CNNs) have primarily been used for supervised classification on 3D modeling and light detection and ranging (LiDAR) data. Here, we demonstrate a 3D-CNN architecture that performs weakly supervised classification to screen head CT images for acute neurologic events. Features were automatically learned from a clinical radiology dataset comprising 37,236 head CTs and were annotated with a semisupervised natural-language processing (NLP) framework. We demonstrate the effectiveness of our approach to triage radiology workflow and accelerate the time to diagnosis from minutes to seconds through a randomized, double-blinded, prospective trial in a simulated clinical environment.
Purpose To compare different methods for generating features from radiology reports and to develop a method to automatically identify findings in these reports. Materials and Methods In this study, 96 303 head computed tomography (CT) reports were obtained. The linguistic complexity of these reports was compared with that of alternative corpora. Head CT reports were preprocessed, and machine-analyzable features were constructed by using bag-of-words (BOW), word embedding, and Latent Dirichlet allocation-based approaches. Ultimately, 1004 head CT reports were manually labeled for findings of interest by physicians, and a subset of these were deemed critical findings. Lasso logistic regression was used to train models for physician-assigned labels on 602 of 1004 head CT reports (60%) using the constructed features, and the performance of these models was validated on a held-out 402 of 1004 reports (40%). Models were scored by area under the receiver operating characteristic curve (AUC), and aggregate AUC statistics were reported for (a) all labels, (b) critical labels, and (c) the presence of any critical finding in a report. Sensitivity, specificity, accuracy, and F1 score were reported for the best performing model's (a) predictions of all labels and (b) identification of reports containing critical findings. Results The best-performing model (BOW with unigrams, bigrams, and trigrams plus average word embeddings vector) had a held-out AUC of 0.966 for identifying the presence of any critical head CT finding and an average 0.957 AUC across all head CT findings. Sensitivity and specificity for identifying the presence of any critical finding were 92.59% (175 of 189) and 89.67% (191 of 213), respectively. Average sensitivity and specificity across all findings were 90.25% (1898 of 2103) and 91.72% (18 351 of 20 007), respectively. Simpler BOW methods achieved results competitive with those of more sophisticated approaches, with an average AUC for presence of any critical finding of 0.951 for unigram BOW versus 0.966 for the best-performing model. The Yule I of the head CT corpus was 34, markedly lower than that of the Reuters corpus (at 103) or I2B2 discharge summaries (at 271), indicating lower linguistic complexity. Conclusion Automated methods can be used to identify findings in radiology reports. The success of this approach benefits from the standardized language of these reports. With this method, a large labeled corpus can be generated for applications such as deep learning. RSNA, 2018 Online supplemental material is available for this article.
Background: Differentiating glioblastoma, brain metastasis, and central nervous system lymphoma (CNSL) on conventional magnetic resonance imaging (MRI) can present a diagnostic dilemma due to the potential for overlapping imaging features. We investigate whether machine learning evaluation of multimodal MRI can reliably differentiate these entities. Methods: Preoperative brain MRI including diffusion weighted imaging (DWI), dynamic contrast enhanced (DCE), and dynamic susceptibility contrast (DSC) perfusion in patients with glioblastoma, lymphoma, or metastasis were retrospectively reviewed. Perfusion maps (rCBV, rCBF), permeability maps (K-trans, Kep, Vp, Ve), ADC, T1C+ and T2/FLAIR images were coregistered and two separate volumes of interest (VOIs) were obtained from the enhancing tumor and non-enhancing T2 hyperintense (NET2) regions. The tumor volumes obtained from these VOIs were utilized for supervised training of support vector classifier (SVC) and multilayer perceptron (MLP) models. Validation of the trained models was performed on unlabeled cases using the leave-one-subject-out method. Head-to-head and multiclass models were created. Accuracies of the multiclass models were compared against two human interpreters reviewing conventional and diffusion-weighted MR images. Results: Twenty-six patients enrolled with histopathologically-proven glioblastoma (n=9), metastasis (n=9), and CNS lymphoma (n=8) were included. The trained multiclass ML models discriminated the three pathologic classes with a maximum accuracy of 69.2% accuracy (18 out of 26; kappa 0.540, P=0.01) using an MLP trained with the VpNET2 tumor volumes. Human readers achieved 65.4% (17 out of 26) and 80.8% (21 out of 26) accuracies, respectively. Using the MLP VpNET2 model as a computer-aided diagnosis (CADx) for cases in which the human reviewers disagreed with each other on the diagnosis resulted in correct diagnoses in 5 (19.2%) additional cases. Conclusions: Our trained multiclass MLP using VpNET2 can differentiate glioblastoma, brain metastasis, and CNS lymphoma with modest diagnostic accuracy and provides approximately 19% increase in diagnostic yield when added to routine human interpretation.
Purpose To compare the outcomes of radiation segmentectomy (RS) and transarterial chemoembolization (TACE) combined with microwave ablation (MWA) in the treatment of unresectable solitary hepatocellular carcinoma (HCC) up to 3 cm. Materials and Methods This retrospective study was approved by the institutional review board, and the requirement to obtain informed consent was waived. From January 2010 to June 2015, a total of 417 and 235 consecutive patients with HCC underwent RS and TACE MWA, respectively. A cohort of 121 patients who had not previously undergone local-regional therapy (RS, 41; TACE MWA, 80; mean age, 65.4 years; 84 men [69.4%]) and who had solitary HCC up to 3 cm without vascular invasion or metastasis was retrospectively identified. Outcomes analyzed included procedure-related complications, laboratory toxicity levels, imaging response, time to progression (TTP), 90-day mortality, and survival. Propensity score matching was conducted by using a nearest-neighbor algorithm (1:1) to account for pretreatment clinical, laboratory, and imaging covariates. Postmatching statistical analysis was performed with conditional logistic regression for binary outcomes and the stratified log-rank test for time-dependent outcomes. Results Before matching, the complication rate was 8.9% and 4.9% in the TACE MWA and RS groups, respectively (P = .46). The overall complete response (CR) rate was 82.9% for RS and 82.5% for TACE MWA (odds ratio, 1.0; 95% confidence interval [CI]: 0.4, 2.8; P = .95). There were 41 (RS, 11; TACE MWA, 30) instances of progression occurring after an initial CR, of which 10 (24%) were classified as target progression (RS, one; TACE MWA, nine). Median overall TTP was 11.1 months (95% CI: 8.8 months, 25.6 months) in the RS group and 12.1 months (95% CI: 7.7 months, 19.1 months) in the TACE MWA group (P > .99). After matching, the overall CR rate (P = .94), TTP (P = .83), and overall survival (P > .99) were not significantly different between the two groups. The 90-day postoperative mortality rate was 0% in both groups. Conclusion Imaging response and progression outcomes of patients with solitary HCC up to 3 cm treated with RS were not significantly different when compared with those of patients treated with TACE MWA. RSNA, 2016 Online supplemental material is available for this article.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.