BackgroundThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.ResultsWe develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.ConclusionsIn addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.
Natural populations of forest trees exhibit striking phenotypic adaptations to diverse environmental gradients, thereby making them appealing subjects for the study of genes underlying ecologically relevant phenotypes. Here, we use a genome-wide data set of single nucleotide polymorphisms genotyped across 3059 functional genes to study patterns of population structure and identify loci associated with aridity across the natural range of loblolly pine (Pinus taeda L.). Overall patterns of population structure, as inferred using principal components and Bayesian cluster analyses, were consistent with three genetic clusters likely resulting from expansions out of Pleistocene refugia located in Mexico and Florida. A novel application of association analysis, which removes the confounding effects of shared ancestry on correlations between genetic and environmental variation, identified five loci correlated with aridity. These loci were primarily involved with abiotic stress response to temperature and drought. A unique set of 24 loci was identified as F ST outliers on the basis of the genetic clusters identified previously and after accounting for expansions out of Pleistocene refugia. These loci were involved with a diversity of physiological processes. Identification of nonoverlapping sets of loci highlights the fundamental differences implicit in the use of either method and suggests a pluralistic, yet complementary, approach to the identification of genes underlying ecologically relevant phenotypes. E NVIRONMENTAL heterogeneity at multiple spatial scales influences the distribution of genetic variation across plant populations. Correlations between genetic variation and environmental gradients have been identified in a variety of plant species
Summary Points List:1. Proteases are essential for proteolytic processing of proneuropeptide precursors into active peptide neurotransmitters and hormones.2. Secretory vesicles represent the primary subcellular site of neuropeptide biosynthesis, which are produced, stored, and secreted to mediate cell-cell communication. 3.Protease pathways for proneuropeptide processing have been elucidated consisting of (a) the newly identified cysteine protease cathepsin L with aminopeptidase B in secretory vesicles, and (b) the well-established, proprotein convertase family that include the neuroendocrine-specific prohormone convertases 1 and 2 (PC1/3 and PC2) with carboxypeptidase E. 4.Protease gene knockout experiments have validated the roles of PC1/3, PC2, as well as cathepsin L for the production of neuropeptides in nervous and endocrine tissues. 5.Endogenous regulators consisting of inhibitors and activators participate in the in vivo control of processing enzyme functions.6. Structural biology of protease and proneuropeptides will be important to understand interacting mechanisms for proneuropeptide processing. 7.Neuropeptidomics has recently been applied to investigations of neuropeptide systems for their primary sequence and structural identification, as well as quantitation by LC-MS/MS tandem mass spectrometry. 8.Proteomic studies have revealed functional protein families that participate in secretory vesicle functions for the production, storage, and secretion of neuropeptides.9. Pharmacological evaluation of unique specificities among neuropeptide processing systems will be valuable for design of future strategies to develop selective small molecule modulators of processing enzymes for therapeutic applications in health and disease.Future Issues: Areas of Neuropeptide Research for Exploration. 1.How are cathepsin L and prohormone convertase protease pathways coordinately regulated? 2.What is the proteolytic basis for tissue-specific processing of proneuropeptides, such as that for POMC?3. Selective and potent inhibitors of protease components for processing prohormones should be developed to facilitate basic and pharmacological research. 4.What are the structural features of prohormone and protease interactions for functional processing? Peptide neurotransmitters and peptide hormones, collectively known as neuropeptides, are required for cell-cell communication in neurotransmission and for regulation of endocrine functions. Neuropeptides are synthesized from protein precursors (termed proneuropeptides or prohormones) that require proteolytic processing primarily within secretory vesicles that store and secrete the mature neuropeptides to control target cellular and organ systems. This review describes interdisciplinary strategies that have elucidated two primary protease pathways for prohormone processing consisting of the cysteine protease pathway mediated by secretory vesicle cathepsin L and the well-known subtilisin-like proprotein convertase pathway that together support neuropeptide biosynthesis. Importantly...
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.