2019
DOI: 10.1371/journal.pone.0221068
|View full text |Cite
|
Sign up to set email alerts
|

TreeCluster: Clustering biological sequences using phylogenetic trees

Abstract: Clustering homologous sequences based on their similarity is a problem that appears in many bioinformatics applications. The fact that sequences cluster is ultimately the result of their phylogenetic relationships. Despite this observation and the natural ways in which a tree can define clusters, most applications of sequence clustering do not use a phylogenetic tree and instead operate on pairwise sequence distances. Due to advances in large-scale phylogenetic inference, we argue that tree-based clustering is… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
118
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 125 publications
(119 citation statements)
references
References 56 publications
1
118
0
Order By: Relevance
“…Phylogenetic clustering was performed using TreeCluster software (Balaban et al, 2019), which is a phylogeny-based clustering method. The ML phylogenetic tree of the reduced M. bovis dataset (n = 1,201) was used as input and the method "Avg Clade" was applied, which means that the average pairwise distance between leaves in a cluster should be at most "t", where we used t = 0.012 substitutions/site.…”
Section: Phylogenetic Clustering and Principal Component Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Phylogenetic clustering was performed using TreeCluster software (Balaban et al, 2019), which is a phylogeny-based clustering method. The ML phylogenetic tree of the reduced M. bovis dataset (n = 1,201) was used as input and the method "Avg Clade" was applied, which means that the average pairwise distance between leaves in a cluster should be at most "t", where we used t = 0.012 substitutions/site.…”
Section: Phylogenetic Clustering and Principal Component Analysismentioning
confidence: 99%
“…Deletions and SNP correspond to the clonal complex markers European 1 (Eu1), European 2 (Eu2) and African 1 (Af1), indicated by triangles. Seven M. bovis clusters (2 through 8) were identified using TreeCluster (Balaban et al, 2019) and they are indicated by colored tips. Cluster 1 corresponds to the outgroup composed of M. orygis and M. caprae genomes.…”
Section: Clonal Complexes Do Not Represent the Whole Diversity Of M mentioning
confidence: 99%
See 1 more Smart Citation
“…With a comparison, the mean p-distance scores are 9.7%, 28.9%, 19.7%, 17.4%, 23%, 15.1%, 9.7% within HEV-1, HEV-2, HEV-3, HEV-4, HEV-6, HEV-7, and HEV-8 groups in Orthohepevirus A , respectively. The cluster analysis is a common technique to classify or grouping data (sequences) to provide evolutionary connections or map the relationship between the related species [ 69 , 70 ]. The rodent hepeviruses form a discrete phylogenetic cluster among species Orthohepevirus C in family Hepeviridae ( Figure 3 ).…”
Section: Phylogenetic Analysis Of Rat Hepevirusesmentioning
confidence: 99%
“…Sequences were then put into different clades based on specific mutations proposed in GISAID (23) and further classified as D614G type (24,25). Subsequently, another phylogenetic tree and haplotype network containing only SARS-CoV-2 sequences from Bangladeshi was constructed and categorized using the same tools, and additionally one step further clustered with TreeCluster (26). The direction of selection in sequences from Bangladesh was calculated by the SLAC algorithm (27) in the Datamonkey server (28).…”
Section: A Total Of 435 Whole Genome Sequences Of Sars-cov-2 Includinmentioning
confidence: 99%