2018
DOI: 10.1186/s13059-018-1595-x
|View full text |Cite
|
Sign up to set email alerts
|

FORGe: prioritizing variants for graph genomes

Abstract: There is growing interest in using genetic variants to augment the reference genome into a graph genome, with alternative sequences, to improve read alignment accuracy and reduce allelic bias. While adding a variant has the positive effect of removing an undesirable alignment score penalty, it also increases both the ambiguity of the reference genome and the cost of storing and querying the genome index. We introduce methods and a software tool called FORGe for modeling these effects and prioritizing variants … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

4
55
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 75 publications
(59 citation statements)
references
References 51 publications
4
55
0
Order By: Relevance
“…Full; graph encoding all 1000G variation in chromosome 6 (excluding NA12878), Min2; graph encoding 196 only variations that were observed in at least two individuals; PopCov10+; graph encoding the top 10% 197 scoring variations as scored by FORGe [30], which weighs variants by allele frequency in the population 198 and minimizes graph complexity. Figure 4a shows the fractions of reads that are correctly and incorrectly 199 aligned onto the different reference genomes.…”
mentioning
confidence: 99%
“…Full; graph encoding all 1000G variation in chromosome 6 (excluding NA12878), Min2; graph encoding 196 only variations that were observed in at least two individuals; PopCov10+; graph encoding the top 10% 197 scoring variations as scored by FORGe [30], which weighs variants by allele frequency in the population 198 and minimizes graph complexity. Figure 4a shows the fractions of reads that are correctly and incorrectly 199 aligned onto the different reference genomes.…”
mentioning
confidence: 99%
“…Though the default scoring functions of tools like BWA-MEM and Bowtie 2 are widely used, they are not very well studied, and this is in large part because it is difficult to separate the effect of the scoring function from the closely related effects of the heuristics. Vargas alignments could also be used to evaluate the effects of different reference genomes on alignment accuracy, such as comparing graph genomes containing different variant sets to each other and to linear references, as investigated using simulation in the FORGe study (Pritt et al, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…While most current heuristic and heuristic-free read alignment algorithms assume that the reference genome is linear, with greater understanding of genetic diversity has come increasing focus on alternatives to the linear reference genome. Various solutions have been proposed that incorporate information about genetic variation in the population, including graph-shaped reference genomes (Paten et al, 2017), pan-genomes (Yang et al, 2019), and a genome that contains the most common (major) allele at each variable site (Pritt et al, 2018;Ballouz et al, 2019). The most recent human reference genome assembly, GRCh38, includes alternate assemblies for hypervariable loci (Church et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…This idea could naturally be combined with our method by replacing the path selection step accordingly, which we plan to explore in future research. Beyond that, Pritt et al (2018) have argued that it might be beneficial to restrict the set of variants used for graph construction to a well-selected subset for two reasons: to avoid introducing unnecessary ambiguity and to simplify indexing. By providing a full-sensitivity index, we have removed the necessity for the latter, creating the opportunity for comprehensive evaluations on the trade-off between added ambiguity and reduced read mapping bias.…”
Section: Discussionmentioning
confidence: 99%