2018
DOI: 10.1101/267211
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

Abstract: An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
137
0

Year Published

2018
2018
2025
2025

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 83 publications
(139 citation statements)
references
References 37 publications
(49 reference statements)
2
137
0
Order By: Relevance
“…Secondly, we imagine that improvements in using the full information in the spatial arrangement of polymorphism surrounding a sweep could come from abandoning summary statistic representations of loci, and instead using image recognition (i.e. CNN) approaches on images created directly from sequence alignments of a genomic region (Chan et al [2018]; L. Flagel, pers. comm.).…”
Section: Discussionmentioning
confidence: 99%
“…Secondly, we imagine that improvements in using the full information in the spatial arrangement of polymorphism surrounding a sweep could come from abandoning summary statistic representations of loci, and instead using image recognition (i.e. CNN) approaches on images created directly from sequence alignments of a genomic region (Chan et al [2018]; L. Flagel, pers. comm.).…”
Section: Discussionmentioning
confidence: 99%
“…Supervised machine learning approaches are rapidly gaining traction among population geneticists, with deep learning in particular beginning to experience increased attention and methodological development due to its exciting potential to unlock classic population genetics problems. Examples of successful SML implementation in population genomics include demographic model choice [39], demographic parameter inference [40], comparative analysis of independent single-population size changes [41], identification of introgressed regions [42], recombination rate estimation [4345], and genomic scans of selective sweeps [24]; deep learning specifically has been employed for joint inference of demography and selection [25], discovery of recombination hotspots [46], estimation of demographic and recombination parameters [47], discovery of functional variants [48], and differentiating between hard and soft sweeps from neutral regions [26]. These applications especially benefit from the ability to handle high dimensional input data and bypassing the need of a likelihood function, which is due to SML uncovering data patterns from leveraging a priori information through a training algorithm [25,28].…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, our SURFDAWave approach is not restricted to application on adaptive introgression and selective sweep scenarios, and can be implemented for probing genomic variation of other evolutionary processes that leave a spatial or temporal signature in genomic data. Such examples include the identification of genomic targets of balancing selection (e.g., DeGiorgio et al, 2014;Siewert and Voight, 2017;Bitarello et al, 2018;Cheng and DeGiorgio, 2018;Siewert and Voight, 2018;, complex forms of adaptation such as staggered selective sweeps (Assaf et al, 2015) that have yet to be interrogated for in genomic data, and non-adaptive processes such as recombination rate estimation (e.g., Chan et al, 2018;Flagel et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…However, these methods do not explicitly model the overall patterns formed by selection events. Other methods forgo explicitly measuring diversity and transform SNP data directly to images to learn population-genetic parameters such as recombination rates (Chan et al, 2018;Flagel et al, 2019) and to identify selected regions (Flagel et al, 2019). The complementary approach of Mughal and DeGiorgio (2019) explicitly models the spatial autocorrelation of summary statistics to capture the underlying wave patterns produced by selective sweeps.…”
Section: Introductionmentioning
confidence: 99%