Schizophrenia occurs in about one in four individuals with 22q11.2 deletion syndrome (22q11.2DS). The aim of this International Brain and Behavior 22q11.2DS Consortium (IBBC) study was to identify genetic factors that contribute to schizophrenia, in addition to the ~20-fold increased risk conveyed by the 22q11.2 deletion. Using whole-genome sequencing data from 519 unrelated individuals with 22q11.2DS, we conducted genome-wide comparisons of common and rare variants between those with schizophrenia and those with no psychotic disorder at age ≥25 years. Available microarray data enabled direct comparison of polygenic risk for schizophrenia between 22q11.2DS and independent population samples with no 22q11.2 deletion, with and without schizophrenia (total n=35,182). Polygenic risk for schizophrenia within 22q11.2DS was significantly greater for those with schizophrenia (p adj =6.73x10-6). Novel reciprocal case-control comparisons between the 22q11.2DS and population-based cohorts showed that polygenic risk score was significantly greater in individuals with psychotic illness, regardless of the presence of the 22q11.2 deletion. Within the 22q11.2DS cohort, results of gene-set analyses showed some support for rare variants affecting synaptic genes. No common or rare variants within the 22q11.2 deletion region were significantly associated with schizophrenia. These findings suggest that in addition to conferring a greatly increased risk to schizophrenia, the risk is higher when the 22q11.2 deletion and common polygenic risk factors that contribute to schizophrenia in the general population are both present.
BackgroundComputational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.ResultsThis paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors.ConclusionsWe proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
SUMMARY The North American beaver is an exceptionally long-lived and cancer-resistant rodent species. Here, we report the evolutionary changes in its gene coding sequences, copy numbers, and expression. We identify changes that likely increase its ability to detoxify aldehydes, enhance tumor suppression and DNA repair, and alter lipid metabolism, potentially contributing to its longevity and cancer resistance. Hpgd , a tumor suppressor gene, is uniquely duplicated in beavers among rodents, and several genes associated with tumor suppression and longevity are under positive selection in beavers. Lipid metabolism genes show positive selection signals, changes in copy numbers, or altered gene expression in beavers. Aldh1a1 , encoding an enzyme for aldehydes detoxification, is particularly notable due to its massive expansion in beavers, which enhances their cellular resistance to ethanol and capacity to metabolize diverse aldehyde substrates from lipid oxidation and their woody diet. We hypothesize that the amplification of Aldh1a1 may contribute to the longevity of beavers.
Many proteins are sorted to multiple subcellular localizations within the cell. However, computational prediction of multi-location proteins remains a challenging task. Here we applied a logistic regression and diffusion kernel based algorithm NetLoc for predicting multiplex proteins and explored its capability and limitations. Experiment shows that the overall and true success rates for physical protein-protein interaction network are 65% and 41% respectively, and for mixed PPI network these values are 88% and 75% respectively. Our study also showed that the performance of NetLoc in predicting protein localization is limited by the network characteristics such as ratio of the number of co-localized protein-protein interactions (coPPI) to the number of non-co-localized PPI (ncPPI) and the density of annotated coPPI in the network. For a given network with a specific number of proteins, NetLoc performance increases with higher coPPI/ncPPI ratio and higher density of annotated coPPI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.