We propose a coalescent model for three species that allows gene flow be-23 tween both pairs of sister populations. The model is designed to analyze multilocus genomic 24 sequence alignments, with one sequence sampled from each of the three species. The model 25 is formulated using a Markov chain representation, which allows use of matrix exponentia-26 tion to compute analytical expressions for the probability density of gene tree genealogies. 27 The gene tree history distribution as well as the gene tree topology distribution under this 28 coalescent model with gene flow are then calculated via numerical integration. We analyze 29 the model to compare the distributions of gene tree topologies and gene tree histories for 30 species trees with differing effective population sizes and gene flow rates. Our results suggest 31 conditions under which the species tree and associated parameters are not identifiable from 32 the gene tree topology distribution when gene flow is present, but indicate that the gene 33 tree history distribution may identify the species tree and associated parameters. Thus, the 34 gene tree history distribution can be used to infer parameters such as the ancestral effective 35 population sizes and the rates of gene flow in a maximum likelihood (ML) framework. We 36 conduct computer simulations to evaluate the performance of our method in estimating these 37 parameters, and we apply our method to an Afrotropical mosquito data set (Fontaine et al., 38 2015) to demonstrate the usefulness of our method for the analysis of empirical data. 39 40 maximum likelihood, speciation. 41 42 gruence between a gene tree and the species tree for the same set of species (Maddison, 43 1997). Incomplete lineage sorting (also called deep coalescence) has long been recognized to 44 be one of the major causes of variation in gene trees across a genome (Pamilo and Nei, 1988; 45 Takahata, 1989). Another important factor leading to discord between gene trees and the 46 species tree is gene flow between populations following speciation (Maddison, 1997; Leaché 47 et al., 2013; Degnan et al., 2012). With some exceptions (noted below), these two pro-48 cesses have been studied in isolation. When carrying out phylogenetic analyses for species 49 that are substantially divergent, ignoring gene flow following speciation may not bias the 50 resulting estimates. However, with the advent of large-scale genomic data sets that allow 51 study of evolutionary relationships among closely related populations or species, the neces-52 sity of simultaneously examining these factors is becoming increasingly apparent (Eckert 53 and Carstens, 2008; Leaché et al., 2013; Huang et al., 2014). In particular, since gene flow 54 may easily occur between sister taxa following speciation, even in the presence of incomplete 55 lineage sorting (Yu et al., 2011), it is necessary to incorporate these processes simultaneously 56 into models used to analyze data for closely related species or populations.
57Degnan and Sal...