Proceedings of the 5th Asia-Pacific Bioinformatics Conference 2007
DOI: 10.1142/9781860947995_0015
|View full text |Cite
|
Sign up to set email alerts
|

A Randomized Algorithm for Comparing Sets of Phylogenetic Trees

Abstract: Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the "true" tree. Post-processing techniques such as strict consensus trees are widely used to summarize the evolutionary relationships into a single tree. However, valuable information is lost during the summarization process. A more elementary step is to produce estimates of the topological differences that exist among all pairs of trees. We design a new randomized algorithm, called Hash-RF, that computes … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2008
2008
2016
2016

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…However, the question arises how to select a hash function for the hash key, which in our case is simply the bipartition vector. The usage of universal hash functions (Carter and Wegman, 1977) as advocated in some more theoretical papers (Sul and Williams, 2007;Sul et al, 2008;Amenta et al, 2003) is highly questionable: firstly, because the computation of a universal hash function given a bit vector of length n is slow, and secondly, universal hash functions only work well when hash keys are equally randomly distributed (Carter and Wegman, 1977), which is not very likely for hash keys that are induced by a hierarchical data structure such as a tree. Those two practical performance considerations have not been addressed in the aforementioned articles.…”
Section: Application Of Bipartition Hashingmentioning
confidence: 99%
“…However, the question arises how to select a hash function for the hash key, which in our case is simply the bipartition vector. The usage of universal hash functions (Carter and Wegman, 1977) as advocated in some more theoretical papers (Sul and Williams, 2007;Sul et al, 2008;Amenta et al, 2003) is highly questionable: firstly, because the computation of a universal hash function given a bit vector of length n is slow, and secondly, universal hash functions only work well when hash keys are equally randomly distributed (Carter and Wegman, 1977), which is not very likely for hash keys that are induced by a hierarchical data structure such as a tree. Those two practical performance considerations have not been addressed in the aforementioned articles.…”
Section: Application Of Bipartition Hashingmentioning
confidence: 99%
“…It is this distance computation that we parallelize in our case study. The distance metric itself is called Robinson-Foulds (RF) distance, and the fastest algorithm for all-to-all RF distance computation is the HashRF algorithm [19], introduced by a software package of the same name. 18 HashRF is about 2-3× as fast as Phy-Bin.…”
Section: Case Study: Phybin: All-to-all Tree Edit Distancementioning
confidence: 99%
“…PhyBin reimplements the HashRF algorithm for full all-to-all Robinson Foulds distance (Sul & Williams, 2007), which is significantly faster than computing the distance matrix with repeated comparison of individual trees (e.g., PAUP (Swofford & Sullivan, 2003)). The HashRF algorithm is fast for today’s data sizes (e.g., hundreds of taxa and thousands of trees), but it scales much more poorly than the basic binning algorithm at significantly larger sizes.…”
Section: Description Of the Programmentioning
confidence: 99%