Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions.
MoonProt 2.0 (http://moonlightingproteins.org) is an updated, comprehensive and open-access database storing expert-curated annotations for moonlighting proteins. Moonlighting proteins contain two or more physiologically relevant distinct functions performed by a single polypeptide chain. Here, we describe developments in the MoonProt website and database since our previous report in the Database Issue of Nucleic Acids Research. For this V 2.0 release, we expanded the number of proteins annotated to 370 and modified several dozen protein annotations with additional or updated information, including more links to protein structures in the Protein Data Bank, compared with the previous release. The new entries include more examples from humans and several model organisms, more proteins involved in disease, and proteins with different combinations of functions. The updated web interface includes a search function using BLAST to enable users to search the database for proteins that share amino acid sequence similarity with a protein of interest. The updated website also includes additional background information about moonlighting proteins and an expanded list of links to published articles about moonlighting proteins.
MoonProt 3.0 (http://moonlightingproteins.org) is an updated open-access database storing expert-curated annotations for moonlighting proteins. Moonlighting proteins have two or more physiologically relevant distinct biochemical or biophysical functions performed by a single polypeptide chain. Here, we describe an expansion in the database since our previous report in the Database Issue of Nucleic Acids Research in 2018. For this release, the number of proteins annotated has been expanded to over 500 proteins and dozens of protein annotations have been updated with additional information, including more structures in the Protein Data Bank, compared with version 2.0. The new entries include more examples from humans, plants and archaea, more proteins involved in disease and proteins with different combinations of functions. More kinds of information about the proteins and the species in which they have multiple functions has been added, including CATH and SCOP classification of structure, known and predicted disorder, predicted transmembrane helices, type of organism, relationship of the protein to disease, and relationship of organism to cause of disease.
Population genetic models only provide coarse representations of real-world ancestry. We use a pedigree compiled from four million parish records and genotype data from 2,276 French and 20,451 French Canadian (FC) individuals, to finely model and trace FC ancestry through space and time. The loss of ancestral French population structure and the appearance of spatial and regional structure highlights a wide range of population expansion models. Geographic features shaped migrations throughout, and we find enrichments for migration, genetic and genealogical relatedness patterns within river networks across Quebec regions. Finally, we provide a freely accessible simulated whole-genome sequence dataset with spatiotemporal metadata for 1,426,749 individuals reflecting intricate FC population structure. Such realistic populations-scale simulations provide new opportunities to investigate population genetics at an unprecedented resolution.Lay SummaryWe all share common ancestors ranging from a couple generations ago to hundreds of thousands of years ago. The genetic differences between individuals today mostly depends on how closely related they are. The only problem is that the actual genealogies that relate all of us are often forgotten over time. Some geneticists have tried to come up with simple models of our shared ancestry but they don’t really explain the full, rich history of humanity. Our study uses a multi-institutional project in Quebec that has digitized parish records into a single unified genealogical database that dates back to the arrival of the first French settlers four hundred years ago. This genealogy traces the ancestry of millions of French-Canadian and we have used it to build a very high resolution genetic map. We used this genetic map to study in detail how certain historical events, and landscapes have influenced the genomes of French-Canadians today.One-Sentence SummaryWe present an accurate and high resolution spatiotemporal model of genetic variation in a founder population.
The recent proliferation of large scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction using single nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods formulate the task of polygenic prediction in terms of a multiple linear regression framework, where the goal is to infer the joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, existing Bayesian approaches employ Markov Chain Monte Carlo (MCMC) algorithms for posterior inference, which are computationally inefficient and do not scale favorably with the number of SNPs included in the analysis. Here, we introduce Variational Inference of Polygenic Risk Scores (VIPRS), a Bayesian summary statistics-based PRS method that utilizes Variational Inference (VI) techniques to efficiently approximate the posterior distribution for the effect sizes. Our experiments with genome-wide simulations and real phenotypes from the UK Biobank (UKB) dataset demonstrated that variational approximations to the posterior are competitively accurate and highly efficient. When compared to state-of-the-art PRS methods, VIPRS consistently achieves the best or second best predictive accuracy in our analyses of 18 simulation configurations as well as 12 real phenotypes measured among the UKB participants of ``White British'' background. This performance advantage was higher among individuals from other ethnic groups, with an increase in R-squared of up to 1.7-fold among participants of Nigerian ancestry for Low-Density Lipoprotein (LDL) cholesterol. Furthermore, given its computational efficiency, we applied VIPRS to a dataset of up to 10 million genetic markers, an order of magnitude greater than the standard HapMap3 subset used to train existing PRS methods. Modeling this expanded set of variants conferred modest improvements in prediction accuracy for a number of highly polygenic traits, such as standing height.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.