Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand.
Dissolved oxygen is an important index to evaluate water quality, and its concentration is of great significance in industrial production, environmental monitoring, aquaculture, food production, and other fields. As its change is a continuous dynamic process, the dissolved oxygen concentration needs to be accurately measured in real time. In this paper, the principles, main applications, advantages, and disadvantages of iodometric titration, electrochemical detection, and optical detection, which are commonly used dissolved oxygen detection methods, are systematically analyzed and summarized. The detection mechanisms and materials of electrochemical and optical detection methods are examined and reviewed. Because external environmental factors readily cause interferences in dissolved oxygen detection, the traditional detection methods cannot adequately meet the accuracy, real-time, stability, and other measurement requirements; thus, it is urgent to use intelligent methods to make up for these deficiencies. This paper studies the application of intelligent technology in intelligent signal transfer processing, digital signal processing, and the real-time dynamic adaptive compensation and correction of dissolved oxygen sensors. The combined application of optical detection technology, new fluorescence-sensitive materials, and intelligent technology is the focus of future research on dissolved oxygen sensors.
Predicting the phenotypes of missense mutations uncovered by large-scale sequencing projects is an important goal in computational biology. High-confidence predictions can be an aid in focusing experimental and association studies on those mutations most likely to be associated with causative relationships between mutation and disease. As an aid in developing these methods further, we have derived a set of random mutations of the enzymatic domains of human cystathionine beta synthase. This enzyme is a dimeric protein that catalyzes the condensation of serine and homocysteine to produce cystathionine. Yeast missing this enzyme cannot grow on medium lacking a source of cysteine, while transfection of functional human CBS into yeast strains missing endogenous enzyme can successfully complement for the missing gene. We used PCR mutagenesis with error-prone Taq polymerase to produce 948 colonies, and compared cell growth in the presence or absence of a cysteine source as a measure of CBS function. We were able to infer the phenotypes of 204 single-site mutants, 79 of them deleterious and 125 neutral. This set was used to test the accuracy of six publicly available prediction methods for phenotype prediction of missense mutations: SIFT, PolyPhen, PMut, SNPs3D, PhD-SNP, and nsSNPAnalyzer. The top methods are PolyPhen, SIFT, and nsSNPAnalyzer, which have similar performance. Using kernel discriminant functions, we found that the difference in position-specific scoring matrix values is more predictive than the wild-type PSSM score alone, and that the relative surface area in the biologically relevant complex is more predictive than that of the monomeric proteins.
New red-emitting phosphors AHfF:Mn (A = Rb, Cs) with a single phase have been successfully synthesized via a simple ion exchange method, and their structures and luminescence properties were investigated. It was found that Mn ions in RbHfF and CsHfF who share wide band gaps can possess broad excitation bands in the blue regions and intense red emission with internal quantum yields of 0.556 and 0.652, respectively. Meanwhile, these red phosphors exhibit high chemical and thermal stabilities. A series of LED devices with a tunable color rendering index and color temperature were fabricated with these samples which can remarkably optimize the optical performances of white light-emitting diodes (w-LEDs). These results indicate that AHfF:Mn phosphors can be promising red phosphors in w-LEDs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.