We describe here the diversity of chloroplast proteins required for embryo development in Arabidopsis (Arabidopsis thaliana). Interfering with certain chloroplast functions has long been known to result in embryo lethality. What has not been reported before is a comprehensive screen for embryo-defective (emb) mutants altered in chloroplast proteins. From a collection of transposon and T-DNA insertion lines at the RIKEN chloroplast function database (http://rarge.psc.riken.jp/chloroplast/) that initially appeared to lack homozygotes and segregate for defective seeds, we identified 23 additional examples of EMB genes that likely encode chloroplast-localized proteins. Fourteen gene identities were confirmed with allelism tests involving duplicate mutant alleles. We then queried journal publications and the SeedGenes database (www.seedgenes.org) to establish a comprehensive dataset of 381 nuclear genes encoding chloroplast proteins of Arabidopsis associated with embryo-defective (119 genes), plant pigment (121 genes), gametophyte (three genes), and alternate (138 genes) phenotypes. Loci were ranked based on the level of certainty that the gene responsible for the phenotype had been identified and the protein product localized to chloroplasts. Embryo development is frequently arrested when amino acid, vitamin, or nucleotide biosynthesis is disrupted but proceeds when photosynthesis is compromised and when levels of chlorophyll, carotenoids, or terpenoids are reduced. Chloroplast translation is also required for embryo development, with genes encoding chloroplast ribosomal and pentatricopeptide repeat proteins well represented among EMB datasets. The chloroplast accD locus, which is necessary for fatty acid biosynthesis, is essential in Arabidopsis but not in Brassica napus or maize (Zea mays), where duplicated nuclear genes compensate for its absence or loss of function.
Despite the widespread use of Arabidopsis (Arabidopsis thaliana) as a model plant, a curated dataset of Arabidopsis genes with mutant phenotypes remains to be established. A preliminary list published nine years ago in Plant Physiology is outdated, and genome-wide phenotype information remains difficult to obtain. We describe here a comprehensive dataset of 2,400 genes with a loss-of-function mutant phenotype in Arabidopsis. Phenotype descriptions were gathered primarily from manual curation of the scientific literature. Genes were placed into prioritized groups (essential, morphological, cellular-biochemical, and conditional) based on the documented phenotypes of putative knockout alleles. Phenotype classes (e.g. vegetative, reproductive, and timing, for the morphological group) and subsets (e.g. flowering time, senescence, circadian rhythms, and miscellaneous, for the timing class) were also established. Gene identities were classified as confirmed (through molecular complementation or multiple alleles) or not confirmed. Relationships between mutant phenotype and protein function, genetic redundancy, protein connectivity, and subcellular protein localization were explored. A complementary dataset of 401 genes that exhibit a mutant phenotype only when disrupted in combination with a putative paralog was also compiled. The importance of these genes in confirming functional redundancy and enhancing the value of single gene datasets is discussed. With further input and curation from the Arabidopsis community, these datasets should help to address a variety of important biological questions, provide a foundation for exploring the relationship between genotype and phenotype in angiosperms, enhance the utility of Arabidopsis as a reference plant, and facilitate comparative studies with model genetic organisms.
Essential genes represent critical cellular components whose disruption results in lethality. Characteristics shared among essential genes have been uncovered in fungal and metazoan model systems. However, features associated with plant essential genes are largely unknown and the full set of essential genes remains to be discovered in any plant species. Here, we show that essential genes in Arabidopsis thaliana have distinct features useful for constructing within-and cross-species prediction models. Essential genes in A. thaliana are often single copy or derived from older duplications, highly and broadly expressed, slow evolving, and highly connected within molecular networks compared with genes with nonlethal mutant phenotypes. These gene features allowed the application of machine learning methods that predicted known lethal genes as well as an additional 1970 likely essential genes without documented phenotypes. Prediction models from A. thaliana could also be applied to predict Oryza sativa and Saccharomyces cerevisiae essential genes. Importantly, successful predictions drew upon many features, while any single feature was not sufficient. Our findings show that essential genes can be distinguished from genes with nonlethal phenotypes using features that are similar across kingdoms and indicate the possibility for translational application of our approach to species without extensive functional genomic and phenomic resources.
The SeedGenes database (www.seedgenes.org) contains information on more than 400 genes required for embryo development in Arabidopsis. Many of these EMBRYO-DEFECTIVE (EMB) genes encode proteins with an essential function required throughout the life cycle. This raises a fundamental question. Why does elimination of an essential gene in Arabidopsis often result in embryo lethality rather than gametophyte lethality? In other words, how do mutant (emb) gametophytes survive and participate in fertilization when an essential cellular function is disrupted? Furthermore, why do some mutant embryos proceed further in development than others? To address these questions, we first established a curated dataset of genes required for gametophyte development in Arabidopsis based on information extracted from the literature. This provided a basis for comparison with EMB genes obtained from the SeedGenes dataset. We also identified genes that exhibited both embryo and gametophyte defects when disrupted by a loss-of-function mutation. We then evaluated the relationship between mutant phenotype, gene redundancy, mutant allele strength, gene expression pattern, protein function, and intracellular protein localization to determine what factors influence the phenotypes of lethal mutants in Arabidopsis. After removing cases where continued development potentially resulted from gene redundancy or residual function of a weak mutant allele, we identified numerous examples of viable mutant (emb) gametophytes that required further explanation. We propose that the presence of gene products derived from transcription in diploid (heterozygous) sporocytes often enables mutant gametophytes to survive the loss of an essential gene in Arabidopsis. Whether gene disruption results in embryo or gametophyte lethality therefore depends in part on the ability of residual, parental gene products to support gametophyte development. We also highlight here 70 preglobular embryo mutants with a zygotic pattern of inheritance, which provide valuable insights into the maternal-to-zygotic transition in Arabidopsis and the timing of paternal gene activation during embryo development.
BackgroundPlant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.ResultsWe developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.ConclusionsThe use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.Electronic supplementary materialThe online version of this article (doi:10.1186/s13007-015-0053-y) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.