Recent studies have shown that one of the parental subgenomes in ancient polyploids is generally more dominant, having retained more genes and being more highly expressed, a phenomenon termed subgenome dominance. The genomic features that determine how quickly and which subgenome dominates within a newly formed polyploid remain poorly understood. To investigate the rate of emergence of subgenome dominance, we examined gene expression, gene methylation, and transposable element (TE) methylation in a natural, <140-year-old allopolyploid (Mimulus peregrinus), a resynthesized interspecies triploid hybrid (M. robertsii), a resynthesized allopolyploid (M. peregrinus), and progenitor species (M. guttatus and M. luteus). We show that subgenome expression dominance occurs instantly following the hybridization of divergent genomes and significantly increases over generations. Additionally, CHH methylation levels are reduced in regions near genes and within TEs in the first-generation hybrid, intermediate in the resynthesized allopolyploid, and are repatterned differently between the dominant and recessive subgenomes in the natural allopolyploid. Subgenome differences in levels of TE methylation mirror the increase in expression bias observed over the generations following hybridization. These findings provide important insights into genomic and epigenomic shock that occurs following hybridization and polyploid events and may also contribute to uncovering the mechanistic basis of heterosis and subgenome dominance.
Deep learning methodologies have revolutionized prediction in many fields and show potential to do the same in molecular biology and genetics. However, applying these methods in their current forms ignores evolutionary dependencies within biological systems and can result in false positives and spurious conclusions. We developed two approaches that account for evolutionary relatedness in machine learning models: (i) gene-family-guided splitting and (ii) ortholog contrasts. The first approach accounts for evolution by constraining model training and testing sets to include different gene families. The second approach uses evolutionarily informed comparisons between orthologous genes to both control for and leverage evolutionary divergence during the training process. The two approaches were explored and validated within the context of mRNA expression level prediction and have the area under the ROC curve (auROC) values ranging from 0.75 to 0.94. Model weight inspections showed biologically interpretable patterns, resulting in the hypothesis that the 3′ UTR is more important for fine-tuning mRNA abundance levels while the 5′ UTR is more important for largescale changes. machine learning | convolutional neural networks | regulation | RNA M achine and deep learning approaches such as Convolutional Neural Networks (CNNs) are largely responsible for a recent paradigm shift in image and natural language processing. These approaches are among the fundamental enablers of modern artificial intelligence advances such as facial recognition, speech recognition, and self-driving vehicles. The same deep learning approaches are beginning to be applied to molecular biology, genetics, agriculture, and medicine (1-7), but evolutionary relationships make properly training and testing models in biology much more challenging than the image or text classification problems mentioned above.For example, if one wants to predict mRNA levels from DNA promoter regions (as we do here), the standard approach from image recognition problems would be to randomly split genes into training and testing sets (8). However, such a split will likely lead to dependencies between the sets because of shared evolutionary histories between genes (i.e., gene family relatedness, gene duplications, etc.) and may cause model overfitting and falsepositive spurious conclusions. Models trained without properly accounting for the constraints imposed by evolutionary history (and perhaps other biological and technical factors specific to the modeling scenario) will likely memorize both the neutral and the functional evolutionary history, rather than learning only the functional elements, leading researchers to incorrect conclusions.With these challenges in mind, we developed two CNN architectures for predicting mRNA expression levels from DNA promoter and/or terminator regions. These include models that predict the following: (i) if a given gene is highly or lowly expressed and (ii) which of two compared gene orthologs has higher mRNA abundance. The architectures are ...
Although the sequence of evolutionary events that produced multiple C4 subtypes within the Paniceae remains undetermined, the results presented here are consistent with only a subset of currently proposed models. The species used in this study constitute a panel of C3 and C4 grasses that are suitable for further studies on C4 photosynthesis, bioenergy, food and forage crops, and various developmental features of the Paniceae.
Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on heldout G, E, and M data (r=0.5 vs r=0.4), and performed slightly worse than standard methods when only G was held out (r=0.74 vs r=0.78). Pre-training on historical data increased accuracy by 1-36% compared to trial data alone. Saliency map analysis indicated the CNN has "learned" to prioritize many factors of known agricultural importance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.