Difficulties of learning from nonstationary data stream are generally twofold. First, dynamically structured learning framework is required to catch up with the evolution of unstable class concepts, i.e., concept drifts. Second, imbalanced class distribution over data stream demands a mechanism to intensify the underrepresented class concepts for improved overall performance. To alleviate the challenges brought by these issues, we propose the recursive ensemble approach (REA) in this paper. To battle against the imbalanced learning problem in training data chunk received at any timestamp t, i.e., S t ; REA adaptively pushes into S t part of minority class examples received within [0, t -1] to balance its skewed class distribution. Hypotheses are then progressively developed over time for all balanced training data chunks and combined together as an ensemble classifier in a dynamically weighted manner, which therefore addresses the concept drifts issue in time. Theoretical analysis proves that REA can provide less erroneous prediction results than a comparative algorithm. Besides that, empirical study on both synthetic benchmarks and real-world data set is also applied to validate effectiveness of REA as compared with other algorithms in terms of evaluation metrics consisting of overall prediction accuracy and ROC curve.
Protein–DNA interactions play crucial roles in the biological systems, and identifying protein–DNA binding sites is the first step for mechanistic understanding of various biological activities (such as transcription and repair) and designing novel drugs. How to accurately identify DNA-binding residues from only protein sequence remains a challenging task. Currently, most existing sequence-based methods only consider contextual features of the sequential neighbors, which are limited to capture spatial information. Based on the recent breakthrough in protein structure prediction by AlphaFold2, we propose an accurate predictor, GraphSite, for identifying DNA-binding residues based on the structural models predicted by AlphaFold2. Here, we convert the binding site prediction problem into a graph node classification task and employ a transformer-based variant model to take the protein structural information into account. By leveraging predicted protein structures and graph transformer, GraphSite substantially improves over the latest sequence-based and structure-based methods. The algorithm is further confirmed on the independent test set of 181 proteins, where GraphSite surpasses the state-of-the-art structure-based method by 16.4% in area under the precision-recall curve and 11.2% in Matthews correlation coefficient, respectively. We provide the datasets, the predicted structures and the source codes along with the pre-trained models of GraphSite at https://github.com/biomed-AI/GraphSite. The GraphSite web server is freely available at https://biomed.nscc-gz.cn/apps/GraphSite.
Purpose: To develop a computer-aided detection ͑CADe͒ scheme for nodules in chest radiographs ͑CXRs͒ with a high sensitivity and a low false-positive ͑FP͒ rate. Methods: The authors developed a CADe scheme consisting of five major steps, which were developed for improving the overall performance of CADe schemes. First, to segment the lung fields accurately, the authors developed a multisegment active shape model. Then, a two-stage nodule-enhancement technique was developed for improving the conspicuity of nodules. Initial nodule candidates were detected and segmented by using the clustering watershed algorithm. Thirty-one shape-, gray-level-, surface-, and gradient-based features were extracted from each segmented candidate for determining the feature space, including one of the new features based on the Canny edge detector to eliminate a major FP source caused by rib crossings. Finally, a nonlinear support vector machine ͑SVM͒ with a Gaussian kernel was employed for classification of the nodule candidates. Results: To evaluate and compare the scheme to other published CADe schemes, the authors used a publicly available database containing 140 nodules in 140 CXRs and 93 normal CXRs. The CADe scheme based on the SVM classifier achieved sensitivities of 78.6% ͑110/140͒ and 71.4% ͑100/140͒ with averages of 5.0 ͑1165/233͒ FPs/image and 2.0 ͑466/233͒ FPs/image, respectively, in a leave-one-out cross-validation test, whereas the CADe scheme based on a linear discriminant analysis classifier had a sensitivity of 60.7% ͑85/140͒ at an FP rate of 5.0 FPs/image. For nodules classified as "very subtle" and "extremely subtle," a sensitivity of 57.1% ͑24/42͒ was achieved at an FP rate of 5.0 FPs/image. When the authors used a database developed at the University of Chicago, the sensitivities was 83.3% ͑40/48͒ and 77.1% ͑37/48͒ at an FP rate of 5.0 ͑240/48͒ FPs/image and 2.0 ͑96/48͒ FPs /image, respectively. Conclusions: These results compare favorably to those described for other commercial and noncommercial CADe nodule detection systems.
Major challenges in current computer-aided detection (CADe) schemes for nodule detection in chest radiographs (CXRs) are to detect nodules that overlap with ribs and/or clavicles and to reduce the frequent false positives (FPs) caused by ribs. Detection of such nodules by a CADe scheme is very important, because radiologists are likely to miss such subtle nodules. Our purpose in this study was to develop a CADe scheme with improved sensitivity and specificity by use of “virtual dual-energy” (VDE) CXRs where ribs and clavicles are suppressed with massive-training artificial neural networks (MTANNs). To reduce rib-induced FPs and detect nodules overlapping with ribs, we incorporated the VDE technology in our CADe scheme. The VDE technology suppressed rib and clavicle opacities in CXRs while maintaining soft-tissue opacity by use of the MTANN technique that had been trained with real dual-energy imaging. Our scheme detected nodule candidates on VDE images by use of a morphologic filtering technique. Sixty morphologic and gray-level-based features were extracted from each candidate from both original and VDE CXRs. A nonlinear support vector classifier was employed for classification of the nodule candidates. A publicly available database containing 140 nodules in 140 CXRs and 93 normal CXRs was used for testing our CADe scheme. All nodules were confirmed by computed tomography examinations, and the average size of the nodules was 17.8 mm. Thirty percent (42/140) of the nodules were rated “extremely subtle” or “very subtle” by a radiologist. The original scheme without VDE technology achieved a sensitivity of 78.6% (110/140) with 5 (1165/233) FPs per image. By use of the VDE technology, more nodules overlapping with ribs or clavicles were detected and the sensitivity was improved substantially to 85.0% (119/140) at the same FP rate in a leave-one-out cross-validation test, whereas the FP rate was reduced to 2.5 (583/233) per image at the same sensitivity level as the original CADe scheme obtained (Difference between the specificities of the original and the VDE-based CADe schemes was statistically significant). In particular, the sensitivity of our VDE-based CADe scheme for subtle nodules (66.7% = 28/42) was statistically significantly higher than that of the original CADe scheme (57.1% = 24/42). Therefore, by use of VDE technology, the sensitivity and specificity of our CADe scheme for detection of nodules, especially subtle nodules, in CXRs were improved substantially.
The lesion regions of a medical image account for only a small part of the image, and a critical imbalance exists in the distribution of the positive and negative samples, which affects the segmentation performance of the lesion regions. Dice loss is beneficial for the image segmentation involving an extreme imbalance of the positive and negative samples but it ignores the background regions, which also contain a large amount of information. In this work, we propose an improved dice loss that can mine the information in background areas and modify network architecture to improve performance. The improved dice loss called weighted soft dice loss (WSDice loss). Our loss function gives a small weight to the background area of the label, so the background area will be added to the calculation when calculating dice loss. It can also soft the hard label in the lesion area to increase the robustness of the model to noise label. What's more, we propose to cascade Focal loss and WSDice loss. Focal Loss is a Distribution-based loss function, WSDice Loss is a Regionbased loss function, the optimization directions of them are different. The cascaded loss function can make full use of the advantages of both and greatly improve model performance. In addition, we add a simple but effective channel attention module to the decode module of U-net. We experimented on the ChestX-ray8 datasets. Compared with Dice loss, WSDice loss improves the dice coefficient by 1.59%, cascaded loss function can improve dice coefficient by 7.81%. The improved in model architecture can increase the dice coefficient by 1.36%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.