2011
DOI: 10.1093/bioinformatics/btr330
|View full text |Cite
|
Sign up to set email alerts
|

The variant call format and VCFtools

Abstract: Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

6
10,797
0
15

Year Published

2011
2011
2018
2018

Publication Types

Select...
10

Relationship

0
10

Authors

Journals

citations
Cited by 12,609 publications
(11,396 citation statements)
references
References 4 publications
6
10,797
0
15
Order By: Relevance
“…By excluding the ones located on unanchored scaffolds, we reduced the number of variants to 13,286,870. We used VCFtools v0.1.14 (Danecek et al., 2011) for the filtering process with the following parameters: minimum read depth (min_DP): 3; maximum read depth (max_DP): 5000; and minimum Phred‐scaled quality score (min_QUAL): 20. We used the R package VcfR (Knaus and Grünwald, 2017) to visualize the distribution of the quality parameters.…”
Section: Methodsmentioning
confidence: 99%
“…By excluding the ones located on unanchored scaffolds, we reduced the number of variants to 13,286,870. We used VCFtools v0.1.14 (Danecek et al., 2011) for the filtering process with the following parameters: minimum read depth (min_DP): 3; maximum read depth (max_DP): 5000; and minimum Phred‐scaled quality score (min_QUAL): 20. We used the R package VcfR (Knaus and Grünwald, 2017) to visualize the distribution of the quality parameters.…”
Section: Methodsmentioning
confidence: 99%
“…SNP variants were then called using the samtools mpileup command with parameters ‐uD, ‐E, and ‐f (Li et al., 2009). The program vcftools (Danecek et al., 2011) was used to filter for SNPs that exhibited minor allele frequencies of <1% and minimum mean sequencing depths of 50 over all individuals. For downstream analysis, a data matrix was created containing the genotype information for each SNP and individual along with the classifier information on sex, population, sampling year, sea stage (1SW/MSW), and fork length.…”
Section: Methodsmentioning
confidence: 99%
“…After retaining only SNPs with the selectvariants module of gatk to avoid later uncertainties in alignments, variants have been further filtered out if any of the following criteria were fulfilled: the quality normalized by the coverage (QD) was <2.0, the Phred‐scaled p ‐value for Fisher's exact test to detect strand bias (FS) was >60.0, or the root mean square of mapping quality across all samples (MQ) was <40. Diversity measures were calculated with vcftools v .0.1.14 (Danecek et al., 2011) with the options –het (for the per‐individual heterozygosity and inbreeding coefficient F ), –site‐pi (for per site nucleotide divergence π) and –singletons (for private alleles).…”
Section: Methodsmentioning
confidence: 99%