Estimating Genome-wide Phylogenies Using Probabilistic Topic Modeling
Marzieh Khodaei,
Scott V. Edwards,
Peter Beerli
Abstract:Inferring the evolutionary history of species or populations employing multilocus analysis is gaining ground in phylogenetic analysis. We developed an alignment-free method to infer the multilocus species tree, which is implemented in the Python package TopicContml. The method operates in two primary stages. First, it uses probabilistic topic modeling (specifically, Latent Dirichlet Allocation or LDA) to extract topic frequencies from k-mers, which are in turn derived from multilocus DNA sequences. Second, the… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.