X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of ‘escape’ from inactivation varying between genes and individuals1,2. The extent to which XCI is shared between cells and tissues remains poorly characterized3,4, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression5 and phenotypic traits6. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity6,7. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.
X chromosome inactivation (XCI) silences the transcription from one of the two X chromosomes in mammalian female cells to balance expression dosage between XX females and XY males. XCI is, however, characteristically incomplete in humans: up to one third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of "escape" from inactivation varying between genes and individuals 1,2 ( Fig. 1). However, the extent to which XCI is shared between cells and tissues remains poorly characterized 3,4 , as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression 5 and phenotypic traits 6 . Here we report a systematic survey of XCI using a combination of over 5,500 transcriptomes from 449 individuals spanning 29 tissues, and 940 single-cell transcriptomes, integrated with genomic sequence data (Fig. 1). By combining information across these data types we show that XCI at the 683 Xchromosomal genes assessed is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven new escape genes supported by multiple lines of evidence, and demonstrate that escape from XCI results in sex biases in gene expression, thus establishing incomplete XCI as a likely mechanism introducing phenotypic diversity 6,7 . Overall, this updated catalogue of XCI across human tissues informs our understanding of the extent and impact of the incompleteness in the maintenance of XCI.
Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context.
Standardized benchmarking methods and tools are essential to robust accuracy assessment of NGS variant calling. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. To address these needs, the Global Alliance for Genomics and Health (GA4GH) Benchmarking Team convened representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, clinical laboratories, and commercial technology and bioinformatics developers for whom benchmarking variant calls is essential to their work. This team addressed challenges in (1) matching variant calls with different representations, (2) defining standard performance metrics, (3) enabling stratification of performance by variant type and genome context, and (4) developing and describing limitations of high-confidence calls and regions that can be used as “truth”. Our methods are publicly available on GitHub (https://github.com/ga4gh/benchmarking-tools) and in a web-based app on precisionFDA, which allow users to compare their variant calls against truth sets and to obtain a standardized report on their variant calling performance. Our methods have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and critically evaluating the results.
Assessing accuracy of NGS variant calling is immensely facilitated by a robust benchmarking strategy and tools to carry it out in a standard way. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. The Global Alliance for Genomics and Health (GA4GH) Benchmarking Team has developed standardized performance metrics and tools for benchmarking germline small variant calls. This Team includes representatives from sequencing technology developers, government agencies, academic bioinformatics researchers, clinical laboratories, and commercial technology and bioinformatics developers for whom benchmarking variant calls is essential to their work. Benchmarking variant calls is a challenging problem for many reasons:• Evaluating variant calls requires complex matching algorithms and standardized counting, because the same variant may be represented differently in truth and query callsets.• Defining and interpreting resulting metrics such as precision (aka positive predictive value = TP/(TP+FP)) and recall (aka sensitivity = TP/(TP+FN)) requires standardization to draw robust conclusions about comparative performance for different variant calling methods.• Performance of NGS methods can vary depending on variant types and genome context; and as a result understanding performance requires meaningful stratification.• High-confidence variant calls and regions that can be used as "truth" to accurately identify false positives and negatives are difficult to define, and reliable calls for the most challenging regions and variants remain out of reach.We have made significant progress on standardizing comparison methods, metric definitions and reporting, as well as developing and using truth sets. Our methods are publicly available on GitHub (https://github.com/ga4gh/benchmarking-tools) and in a web-based app on precisionFDA, which allow users to compare their variant calls against truth sets and to obtain a standardized report on their variant calling performance. Our methods have been piloted in the precisionFDA variant calling challenges to identify the best-in-class variant calling methods within highconfidence regions. Finally, we recommend a set of best practices for using our tools and critically evaluating the results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.