2010
DOI: 10.1093/nar/gkq1197
|View full text |Cite
|
Sign up to set email alerts
|

dbDNV: a resource of duplicated gene nucleotide variants in human genome

Abstract: Gene duplications are scattered widely throughout the human genome. A single-base difference located in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. This imperfection is undistinguishable in current genotyping methods. As the next-generation sequencing technologies become more popular for sequence-based association studies, numerous ambiguous SNPs are rapidly accumulated. Thus, analyzing duplication variations in the reference genome to assis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 24 publications
0
11
0
Order By: Relevance
“…Finally, we included scaffolds not mapped to particular chromosomes, and removed potentially duplicated regions that were missing from the reference genome. These procedures have been demonstrated to reduce false genetic variation observed in RNA sequencing data ( Peng et al 2012 ), as paralogous variants could be mistaken for STR variation when an incomplete reference genome is used ( Ho et al 2011 ; Bass et al 2012 ; Peng et al 2012 ).…”
Section: Discussionmentioning
confidence: 99%
“…Finally, we included scaffolds not mapped to particular chromosomes, and removed potentially duplicated regions that were missing from the reference genome. These procedures have been demonstrated to reduce false genetic variation observed in RNA sequencing data ( Peng et al 2012 ), as paralogous variants could be mistaken for STR variation when an incomplete reference genome is used ( Ho et al 2011 ; Bass et al 2012 ; Peng et al 2012 ).…”
Section: Discussionmentioning
confidence: 99%
“…Directly related to this, the presence of CNVs in a sample can cause other problems for short-read mapping and downstream genotype inference. For example, heterozygous gene deletions (hemizygotes) can masquerade as homozygotes for a given SNP or coding allele, whereas paralogous sequence variants between close gene duplicates can result in artifactual heterozygote calls (24,25).…”
mentioning
confidence: 99%
“…eDiVA‐Score is built by training a random forest (RF) model using the R “randomForest” package with 1000 binary classification trees (Breiman, ; Hastie, Tibshirani, & Friedman, ) and five‐fold cross validation. Eleven features were selected to train the RF model: (a) the maximum minor allele frequency (MAF) of 1000Genomes and GnomAD databases; (b) four conservation measures (conservation in primates and mammals using the PhastCons (Hubisz et al, ) and PhyloP (Pollard, Hubisz, Rosenbloom, & Siepel, ); (c) four functional impact predictors: Condel (González‐Pérez & López‐Bigas, ), Phred‐scaled CADD score (Kircher et al, ), Eigen (Ionita‐Laza et al, ), and Mutation Assessor (Reva et al, ); (d) the likelihood to be in a segmental duplication, which correlates with false‐positive variant calls (Ho, Tsai, Chen, & Lin, ); and (e) an in‐house estimator of systematic sequencing errors called ABB‐score (Muyas et al, ). Note that Condel, Eigen and CADD are combination scores integrating several features also included in eDiVA‐score, namely evolutionary conservation (PhastCons and PhyloP in mammals and primates) and Mutation Assessor scores.…”
Section: Methodsmentioning
confidence: 99%