2013
DOI: 10.1101/gr.146084.112
|View full text |Cite
|
Sign up to set email alerts
|

An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data

Abstract: Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 53) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by lowcoverage population sequencing: (1) effective base… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
86
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 93 publications
(89 citation statements)
references
References 38 publications
3
86
0
Order By: Relevance
“…For example, the Genome Analysis Toolkit (GATK) (DePristo et al 2011), SAMtools (Li 2011), and SNPTools (Wang et al 2013) are used for variant discovery and genotyping from small to moderate numbers of sequenced samples. However, as the number of sequenced genomes grows, analysis becomes increasingly challenging, requiring complex data processing steps, division of sequence data into many small regions, management and scheduling of analysis jobs, and often, prohibitive demands on computing resources.…”
Section: [Supplemental Materials Is Available For This Article]mentioning
confidence: 99%
“…For example, the Genome Analysis Toolkit (GATK) (DePristo et al 2011), SAMtools (Li 2011), and SNPTools (Wang et al 2013) are used for variant discovery and genotyping from small to moderate numbers of sequenced samples. However, as the number of sequenced genomes grows, analysis becomes increasingly challenging, requiring complex data processing steps, division of sequence data into many small regions, management and scheduling of analysis jobs, and often, prohibitive demands on computing resources.…”
Section: [Supplemental Materials Is Available For This Article]mentioning
confidence: 99%
“…All animals were captive born in research colonies, with the exception of three wild-born Chinese rhesus macaques (Supplemental Table S1). SNVs were identified using both GATK (DePristo et al 2011) and SNPTools (Wang et al 2013). The intersection of the two variant call sets identified 43.7 million SNVs, 31.9 million among the 124 Indian-origin rhesus macaques (IRh) and 30.1 million variants in the nine Chinese-origin animals (CRh).…”
Section: Genome-wide Single-nucleotide Variationmentioning
confidence: 99%
“…Single-nucleotide variants (SNVs) were identified by mapping the quality-filtered sequence reads to the rheMac2 rhesus macaque whole-genome assembly (Rhesus Macaque Genome Sequencing and Analysis Consortium et al 2007) using BWA (Li and Durbin 2009). SNVs were then called using SNPTools (Wang et al 2013). This process identified slightly more than 53.7 million SNVs.…”
Section: Initial Sample Collection and Sequencingmentioning
confidence: 99%
“…Our initial SNP calls were based on infinite odds ratios (i.e., present in all 4 dams of one group, and absent in all of the other) as estimated by ssahaSNP, which only identifies homozygous alternative allele SNPs. In order to further assure high quality and statistically robust SNP identification in an initial small discovery cohort of 8 animals, three tools were used to call SNPs (Atlas-SNP2 35 , SNPTools 36 , and ssahaSNP 37 ) and the only SNPs called by all three tools were selected as candidates for further analysis. Variant Effect Predictor (VEP) 38 based on rhesus gene models was used to identify functional consequences of the SNPS and SNPs were lifted over to the human genome in order to perform SIFT 39 and PROVEAN 40 predictions of the effect of amino acid differences.…”
Section: Methodsmentioning
confidence: 99%
“…To assure high quality SNP identification, we used three tools to call SNPs (Atlas-SNP2 35 , SNPTools 36 , and ssahaSNP 37 ) and only selected the 1534 SNPs called by all three for further analysis (Supplemental Fig. 2 and Supplemental Tables S1 and S2).…”
Section: Novel Snp Identification and Genotyping With An Exon-hybrid mentioning
confidence: 99%