Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.
Betalain pigments are unique to the Caryophyllales and structurally and biosynthetically distinct from anthocyanins. Two key enzymes within the betalain synthesis pathway have been identified: 4,5-dioxygenase (DODA) that catalyzes the formation of betalamic acid and CYP76AD1, a cytochrome P450 gene that catalyzes the formation of cyclo-DOPA.We performed phylogenetic analyses to reveal the evolutionary history of the DODA and CYP76AD1 lineages and in the context of an ancestral reconstruction of pigment states we explored the evolution of these genes in relation to the complex evolution of pigments in Caryophylalles.Duplications within the CYP76AD1 and DODA lineages arose just before the origin of betalain pigmentation in the core Caryophyllales. The duplications gave rise to DODA-α and CYP76AD1-α isoforms that appear specific to betalain synthesis. Both betalain-specific isoforms were then lost or downregulated in the anthocyanic Molluginaceae and Caryophyllaceae.Our findings suggest a single origin of the betalain synthesis pathway, with neofunctionalization following gene duplications in the CYP76AD1 and DODA lineages. Loss of DODA-α and CYP76AD1-α in anthocyanic taxa suggests that betalain pigmentation has been lost twice in Caryophyllales, and exclusion of betalain pigments from anthocyanic taxa is mediated through gene loss or downregulation. [Correction added after online publication 13 May 2015: in the last two paragraphs of the Summary the gene name CYP761A was changed to CYP76AD1.]
C(4) photosynthesis is normally associated with the compartmentation of photosynthesis between mesophyll (M) and bundle sheath (BS) cells. The mechanisms regulating the differential accumulation of photosynthesis proteins in these specialized cells are fundamental to our understanding of how C(4) photosynthesis operates. Cell-specific accumulation of proteins in M or BS can be mediated by posttranscriptional processes and translational efficiency as well as by differences in transcription. Individual genes are likely regulated at multiple levels. Although cis-elements have been associated with cell-specific expression in C(4) leaves, there has been little progress in identifying trans-factors. When C(4) photosynthesis genes from C(4) species are placed in closely related C(3) species, they are often expressed in a manner faithful to the C(4) cycle. Next-generation sequencing and comprehensive analysis of the extent to which genes from C(4) species are expressed in M or BS cells of C(3) plants should provide insight into how the C(4) pathway is regulated and evolved.
Leaves of almost all C 4 lineages separate the reactions of photosynthesis into the mesophyll (M) and bundle sheath (BS). The extent to which messenger RNA profiles of M and BS cells from independent C 4 lineages resemble each other is not known. To address this, we conducted deep sequencing of RNA isolated from the M and BS of Setaria viridis and compared these data with publicly available information from maize (Zea mays). This revealed a high correlation (r = 0.89) between the relative abundance of transcripts encoding proteins of the core C 4 pathway in M and BS cells in these species, indicating significant convergence in transcript accumulation in these evolutionarily independent C 4 lineages. We also found that the vast majority of genes encoding proteins of the C 4 cycle in S. viridis are syntenic to homologs used by maize. In both lineages, 122 and 212 homologous transcription factors were preferentially expressed in the M and BS, respectively. Sixteen shared regulators of chloroplast biogenesis were identified, 14 of which were syntenic homologs in maize and S. viridis. In sorghum (Sorghum bicolor), a third C 4 grass, we found that 82% of these trans-factors were also differentially expressed in either M or BS cells. Taken together, these data provide, to our knowledge, the first quantification of convergence in transcript abundance in the M and BS cells from independent lineages of C 4 grasses. Furthermore, the repeated recruitment of syntenic homologs from large gene families strongly implies that parallel evolution of both structural genes and trans-factors underpins the polyphyletic evolution of this highly complex trait in the monocotyledons.
Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.