Graphical AbstractHighlights d SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence d 75% of predicted cryptic splice variants validate on RNA-seq d Cryptic splicing may yield 10% of pathogenic variants in neurodevelopmental disorders d Cryptic splice variants frequently give rise to alternative splicing A deep neural network precisely models mRNA splicing from a genomic sequence and accurately predicts noncoding cryptic splice mutations in patients with rare genetic diseases. SUMMARYThe splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.(legend continued on next page) (F) Relationship between exon-intron length and the strength of the adjoining splice sites, as predicted by SpliceAI-80 nt (local motif score) and SpliceAI-10k. The genome-wide distributions of exon length (yellow) and intron length (pink) are shown in the background. The x axis is in log-scale. (G) A pair of splice acceptor and donor motifs, placed 150 nt apart, are walked along the HMGCR gene. Shown are, at each position, K562 nucleosome signal and the likelihood of the pair forming an exon at that position, as predicted by SpliceAI-10k. The genome-wide Spearman correlation between the two tracks is shown. (H) Average K562 and GM12878 nucleosome signal near private mutations that are predicted by the SpliceAI-10k model to create novel exons in the GTEx cohort.
Key message The integration of new technologies into public plant breeding programs can make a powerful step change in agricultural productivity when aligned with principles of quantitative and Mendelian genetics. Abstract The breeder’s equation is the foundational application of quantitative genetics to crop improvement. Guided by the variables that describe response to selection, emerging breeding technologies can make a powerful step change in the effectiveness of public breeding programs. The most promising innovations for increasing the rate of genetic gain without greatly increasing program size appear to be related to reducing breeding cycle time, which is likely to require the implementation of parent selection on non-inbred progeny, rapid generation advance, and genomic selection. These are complex processes and will require breeding organizations to adopt a culture of continuous optimization and improvement. To enable this, research managers will need to consider and proactively manage the, accountability, strategy, and resource allocations of breeding teams. This must be combined with thoughtful management of elite genetic variation and a clear separation between the parental selection process and product development and advancement process. With an abundance of new technologies available, breeding teams need to evaluate carefully the impact of any new technology on selection intensity, selection accuracy, and breeding cycle length relative to its cost of deployment. Finally breeding data management systems need to be well designed to support selection decisions and novel approaches to accelerate breeding cycles need to be routinely evaluated and deployed.
SUMMARY Genome-wide association studies have struggled to identify functional genes and variants underlying complex phenotypes. We recruited a multi-ethnic cohort of healthy volunteers (n = 91) and used their tissue to generate induced pluripotent stem cells (iPSCs) and hepatocyte-like cells (HLCs) for genome-wide mapping of expression quantitative trait loci (eQTLs) and allele-specific expression (ASE). We identified many eQTL genes (eGenes) not observed in the comparably sized Genotype-Tissue Expression project’s human liver cohort (n = 96). Focusing on blood lipid-associated loci, we performed massively parallel reporter assays to screen candidate functional variants and used genome-edited stem cells, CRISPR interference, and mouse modeling to establish rs2277862-CPNE1, rs10889356-DOCK7, rs10889356-ANGPTL3, and rs10872142-FRK as functional SNP-gene sets. We demonstrated HLC eGenes CPNE1, VKORC1, UBE2L3, and ANGPTL3 and HLC ASE gene ACAA2 to be lipid-functional genes in mouse models. These findings endorse an iPSC-based experimental framework to discover functional variants and genes contributing to complex human traits.
Two populations of interspecific introgression lines (ILs) in a common recurrent parent were developed for use in pre-breeding and QTL mapping. The ILs were derived from crosses between cv Curinga, a tropical japonica upland cultivar, and two different wild donors, Oryza meridionalis Ng. accession (W2112) and Oryza rufipogon Griff. accession (IRGC 105491). The lines were genotyped using genotyping-by-sequencing (GBS) and SSRs. The 32 Curinga/O. meridionalis ILs contain 76.73 % of the donor genome in individual introgressed segments, and each line has an average of 94.9 % recurrent parent genome. The 48 Curinga/O. rufipogon ILs collectively contain 97.6 % of the donor genome with an average of 89.9 % recurrent parent genome per line. To confirm that these populations were segregating for traits of interest, they were phenotyped for pericarp color in the greenhouse and for four agronomic traits—days to flowering, plant height, number of tillers, and number of panicles—in an upland field environment. Seeds from these IL libraries and the accompanying GBS datasets are publicly available and represent valuable genetic resources for exploring the genetics and breeding potential of rice wild relatives.Electronic supplementary materialThe online version of this article (doi:10.1007/s11032-015-0276-7) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.