The Collaborative Cross Consortium reports here on the development of a unique genetic resource population. The Collaborative Cross (CC) is a multiparental recombinant inbred panel derived from eight laboratory mouse inbred strains. Breeding of the CC lines was initiated at multiple international sites using mice from The Jackson Laboratory. Currently, this innovative project is breeding independent CC lines at the University of North Carolina (UNC), at Tel Aviv University (TAU), and at Geniad in Western Australia (GND). These institutions aim to make publicly available the completed CC lines and their genotypes and sequence information. We genotyped, and report here, results from 458 extant lines from UNC, TAU, and GND using a custom genotyping array with 7500 SNPs designed to be maximally informative in the CC and used a novel algorithm to infer inherited haplotypes directly from hybridization intensity patterns. We identified lines with breeding errors and cousin lines generated by splitting incipient lines into two or more cousin lines at early generations of inbreeding. We then characterized the genome architecture of 350 genetically independent CC lines. Results showed that founder haplotypes are inherited at the expected frequency, although we also consistently observed highly significant transmission ratio distortion at specific loci across all three populations. On chromosome 2, there is significant overrepresentation of WSB/EiJ alleles, and on chromosome X, there is a large deficit of CC lines with CAST/EiJ alleles. Linkage disequilibrium decays as expected and we saw no evidence of gametic disequilibrium in the CC population as a whole or in random subsets of the population. Gametic equilibrium in the CC population is in marked contrast to the gametic disequilibrium present in a large panel of classical inbred strains. Finally, we discuss access to the CC population and to the associated raw data describing the genetic structure of individual lines. Integration of rich phenotypic and genomic data over time and across a wide variety of fields will be vital to delivering on one of the key attributes of the CC, a common genetic reference platform for identifying causative variants and genetic networks determining traits in mammals.
Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Since regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in this process. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. These effects influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a novel, global allelic imbalance in favor of the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals.
We systematically studied the association between somatic copy number aberration (SCNA), DNA methylation and gene expression using -omic data from The Cancer Genome Atlas (TCGA) on six cancer types: breast cancer, colon cancer, glioblastoma, leukemia, lower-grade glioma and prostate cancer. A major challenge for such integrated study is that the association between DNA methylation and gene expression is severely confounded by tumor purity and cell type composition, which are often unobserved and difficult to estimate. To overcome this challenge, we developed a method to remove confounding effects by calculating the principal components that span the space of the latent factors. Another intriguing findings of our study is that there could be both positive and negative associations between SCNA and DNA methylation, while the CpGs with negative/positive associations with SCNA are often located around CpG islands/ocean, respectively. A joint study of SCNA, DNA methylation, and gene expression suggest that SCNA often affect DNA methylation and gene expression independently.
RNA sequencing (RNA-seq) not only measures total gene expression but may also measure allele-specific gene expression in diploid individuals. RNA-seq data collected from F 1 reciprocal crosses in mice can powerfully dissect strain and parent-of-origin effects on allelic imbalance of gene expression. In this article, we develop a novel statistical approach to analyze RNA-seq data from F 1 and inbred strains. Method development was motivated by a study of F 1 reciprocal crosses derived from highly divergent mouse strains, to which we apply the proposed method. Our method jointly models the total number of reads and the number of allele-specific reads of each gene, which significantly boosts power for detecting strain and particularly parent-of-origin effects. The method deals with the overdispersion problem commonly observed in read counts and can flexibly adjust for the effects of covariates such as sex and read depth. The X chromosome in mouse presents particular challenges. As in other mammals, X chromosome inactivation silences one of the two X chromosomes in each female cell, although the choice of which chromosome to be silenced can be highly skewed by alleles at the X-linked X-controlling element (Xce) and stochastic effects. Our model accounts for these chromosome-wide effects on an individual level, allowing proper analysis of chromosome X expression. Furthermore, we propose a genomic control procedure to properly control type I error for RNA-seq studies. A number of these methodological improvements can also be applied to RNA-seq data from other species as well as other types of next-generation sequencing data sets. Finally, we show through simulations that increasing the number of samples is more beneficial than increasing the library size for mapping both the strain and parent-of-origin effects. Unless sample recruiting is too expensive to conduct, we recommend sequencing more samples with lower coverage.
Using information from allele-specific gene expression (ASE) can improve the power to map gene expression quantitative trait loci (eQTLs). However, such practice has been limited, partly due to computational challenges and lack of clarification on the size of power gain or new findings besides improved power. We have developed geoP, a computationally efficient method to estimate permutation p-values, which makes it computationally feasible to perform eQTL mapping with ASE counts for large cohorts. We have applied geoP to map eQTLs in 28 human tissues using the data from the Genotype-Tissue Expression (GTEx) project. We demonstrate that using ASE data not only substantially improve the power to detect eQTLs, but also allow us to quantify individual-specific genetic effects, which can be used to study the variation of eQTL effect sizes with respect to other covariates. We also compared two popular methods for eQTL mapping with ASE: TReCASE and RASQUAL. TReCASE is ten times or more faster than RASQUAL and it provides more robust type I error control.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.