BackgroundMassively parallel sequencing of cDNA is now an efficient route for generating enormous sequence collections that represent expressed genes. This approach provides a valuable starting point for characterizing functional genetic variation in non-model organisms, especially where whole genome sequencing efforts are currently cost and time prohibitive. The large and complex genomes of pines (Pinus spp.) have hindered the development of genomic resources, despite the ecological and economical importance of the group. While most genomic studies have focused on a single species (P. taeda), genomic level resources for other pines are insufficiently developed to facilitate ecological genomic research. Lodgepole pine (P. contorta) is an ecologically important foundation species of montane forest ecosystems and exhibits substantial adaptive variation across its range in western North America. Here we describe a sequencing study of expressed genes from P. contorta, including their assembly and annotation, and their potential for molecular marker development to support population and association genetic studies.ResultsWe obtained 586,732 sequencing reads from a 454 GS XLR70 Titanium pyrosequencer (mean length: 306 base pairs). A combination of reference-based and de novo assemblies yielded 63,657 contigs, with 239,793 reads remaining as singletons. Based on sequence similarity with known proteins, these sequences represent approximately 17,000 unique genes, many of which are well covered by contig sequences. This sequence collection also included a surprisingly large number of retrotransposon sequences, suggesting that they are highly transcriptionally active in the tissues we sampled. We located and characterized thousands of simple sequence repeats and single nucleotide polymorphisms as potential molecular markers in our assembled and annotated sequences. High quality PCR primers were designed for a substantial number of the SSR loci, and a large number of these were amplified successfully in initial screening.ConclusionsThis sequence collection represents a major genomic resource for P. contorta, and the large number of genetic markers characterized should contribute to future research in this and other pines. Our results illustrate the utility of next generation sequencing as a basis for marker development and population genomics in non-model species.
The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Gene duplication is an important process in the functional divergence of genes and genomes. Several processes have been described that lead to duplicate gene retention over different timescales after both smaller-scale events and whole-genome duplication, including neofunctionalization, subfunctionalization, and dosage balance. Two common modes of duplicate gene loss include nonfunctionalization and loss due to population dynamics (failed fixation). Previous work has characterized expectations of duplicate gene retention under the neofunctionalization and subfunctionalization models. Here, that work is extended to dosage balance using simulations. A general model for duplicate gene loss/retention is then presented that is capable of fitting expectations under the different models, is defined at t = 0, and decays to an orthologous asymptotic rate rather than zero, based upon a modified Weibull hazard function. The model in a maximum likelihood framework shows the property of identifiability, recovering the evolutionary mechanism and parameters of simulation. This model is also capable of recovering the evolutionary mechanism of simulation from data generated using an unrelated network population genetic model. Lastly, the general model is applied as part of a mixture model to recent gene duplicates from the Oikopleura dioica genome, suggesting that neofunctionalization may be an important process leading to duplicate gene retention in that organism.
BackgroundProtein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field.ResultsHere we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites.ConclusionsSimple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model.
Infrared (IR) amide I' spectra are widely used for investigations of the structural properties of proteins in aqueous solution. For analysis of the experimental data, it is necessary to separate the spectral features due to the backbone conformation from those arising from other factors, in particular the interaction with solvent. We investigate the effects of solvation on amide I' spectra for a small 40-residue helix-turn-helix protein by theoretical simulations based on density functional theory (DFT). The vibrational force fields and intensity parameters for the protein amide backbone are constructed by transfer from smaller heptaamide fragments; the side chains are neglected in the DFT calculations. Solvent is modeled at two different levels: first as explicit water hydrogen bonded to the surface amide groups, treated at the same DFT level, and, second, using the electrostatic map approach combined with molecular dynamics (MD) simulation. Motional narrowing of the spectral band shapes due to averaging over the fast solvent fluctuation is introduced by use of the time-averaging approximation (TAA). The simulations are compared with the experimental amide I', including two (13)C isotopically edited spectra, corrected for the side-chain signals. Both solvent models are consistent with the asymmetric experimental band shape, which arises from the differential solvation of the amide backbone. However, the effects of (13)C isotopic labeling are best captured by the gas-phase calculations. The limitations of the solvent models and implications for the theoretical simulations of protein amide vibrational spectra are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.