2019
DOI: 10.1093/gigascience/giz074
|View full text |Cite
|
Sign up to set email alerts
|

Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data

Abstract: Background Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

5
90
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 76 publications
(95 citation statements)
references
References 52 publications
5
90
0
Order By: Relevance
“…However, in the context of rare-variant data, family-based studies face one major hurdle: they are sensitive to genotyping/sequencing errors. In the context of sex-specific analyses, this issue is further aggravated as many genetic regions show sequence homology with the X-chromosome 29 . This can lead to differential genotyping error rates for females and males due to different X chromosome dosage.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…However, in the context of rare-variant data, family-based studies face one major hurdle: they are sensitive to genotyping/sequencing errors. In the context of sex-specific analyses, this issue is further aggravated as many genetic regions show sequence homology with the X-chromosome 29 . This can lead to differential genotyping error rates for females and males due to different X chromosome dosage.…”
mentioning
confidence: 99%
“…This can lead to differential genotyping error rates for females and males due to different X chromosome dosage. Ignoring the impact of such sex-specific genotyping/sequencing errors can lead to substantially inflated type-1 errors 29 .…”
mentioning
confidence: 99%
“…Specifically, we mapped the exome samples to a sex chromosome complement informed reference genome in which the Y chromosome is hard-masked (to avoid mismapping of X-linked reads to homologous regions on the Y chromosome in the XX samples). To generate the sex chromosome complement reference genome we employed XYalign (Webster et al 2019). XYalign created a Y-masked gencode GRCh38.p12 human reference genome for aligning XX individuals (Harrow et al 2012).…”
Section: Exome Sequence Data Processingmentioning
confidence: 99%
“…Post trimming quality was checked using fastqc version 0.11.8 (Andrews 2010) and multiqc version 0.9 (Ewels et al 2016) ( Figure S2, D, E, and F). Trimmed RNAseq reads were then aligned to a sex chromosome complement informed reference genome with the Y-masked (Webster et al 2019;Olney et al 2019) gencode GRCh38.p12 reference genome (Harrow et al 2012). Total reads mapped and duplicate reads were visually checked using BAMtools stats (Barnett et al 2011) (Table S3).…”
Section: Rna-seq Data Processingmentioning
confidence: 99%
“…the sexual chromosomes. Alignment of short reads in such genomic regions is typically challenging and presence of these artefacts is likely 5 due to mapping issues ( Figure S1) [20]. The fact that most of the variations observed in healthy individuals are shared among samples is another indication that they could represent artefacts ( Figure S1).…”
mentioning
confidence: 99%