The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for > 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F1 score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., > 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing.
Computer-aided image analysis (CAI) can help objectively quantify morphologic features of hematoxylin-eosin (HE) histopathology images and provide potentially useful prognostic information on breast cancer. We performed a CAI workflow on 1,150 HE images from 230 patients with invasive ductal carcinoma (IDC) of the breast. We used a pixel-wise support vector machine classifier for tumor nests (TNs)-stroma segmentation, and a marker-controlled watershed algorithm for nuclei segmentation. 730 morphologic parameters were extracted after segmentation, and 12 parameters identified by Kaplan-Meier analysis were significantly associated with 8-year disease free survival (P < 0.05 for all). Moreover, four image features including TNs feature (HR 1.327, 95%CI [1.001 - 1.759], P = 0.049), TNs cell nuclei feature (HR 0.729, 95%CI [0.537 - 0.989], P = 0.042), TNs cell density (HR 1.625, 95%CI [1.177 - 2.244], P = 0.003), and stromal cell structure feature (HR 1.596, 95%CI [1.142 - 2.229], P = 0.006) were identified by multivariate Cox proportional hazards model to be new independent prognostic factors. The results indicated that CAI can assist the pathologist in extracting prognostic information from HE histopathology images for IDC. The TNs feature, TNs cell nuclei feature, TNs cell density, and stromal cell structure feature could be new prognostic factors.
Scale-free networks have been rediscovered universally in natural and man-made systems, and the consensus protocols have been tremendously studied over the last decade. Motivated by the fractional-order dynamics of bacteria colonies, a fractional-order protocol is employed to achieve the consensus over scale-free networks. The most remarkable property of scale-free networks lies in the inverse power-law degree distributions. The present work concerns the convergence speed with different fractional orders corresponding to different power-law parameters. The analytic solutions of consensus protocols are given and its property is discussed, explaining the quick convergence speed in the early stage of the consensus process, and the slower performance later. Inspired by such behavior, a switching order consensus protocol is proposed, which efficiently increases the convergence speed and ensures the exponential convergence as time tends to infinity. The disagreement of the system during the consensus procedure is investigated. Theoretic analysis and simulations demonstrate that, for certain scale-free networks, an optimal order exists so that the fractional-order consensus algorithm can minimize the disagreement or its integral.
The computed tomography angiography (CTA) postprocessing manually recognized by technologists is extremely labor intensive and error prone. We propose an artificial intelligence reconstruction system supported by an optimized physiological anatomical-based 3D convolutional neural network that can automatically achieve CTA reconstruction in healthcare services. This system is trained and tested with 18,766 head and neck CTA scans from 5 tertiary hospitals in China collected between June 2017 and November 2018. The overall reconstruction accuracy of the independent testing dataset is 0.931. It is clinically applicable due to its consistency with manually processed images, which achieves a qualification rate of 92.1%. This system reduces the time consumed from 14.22 ± 3.64 min to 4.94 ± 0.36 min, the number of clicks from 115.87 ± 25.9 to 4 and the labor force from 3 to 1 technologist after five months application. Thus, the system facilitates clinical workflows and provides an opportunity for clinical technologists to improve humanistic patient care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.