2014
DOI: 10.1186/1756-0500-7-320
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of next-generation sequencing samples using compression-based distances and its application to phylogenetic reconstruction

Abstract: BackgroundEnormous volumes of short read data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to the challenging problem of short read assembly. Thus alignment-free methods are needed for the comparison of NGS samples of short reads.ResultsRecently several k-mer based distance measures such as CVTree, d2S, and co-phylog have been proposed or enhanced to address … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 36 publications
0
6
0
Order By: Relevance
“…The set of 29 E . coli genome sequences was originally compiled by Yin and Jin [23] and has been used in the past by other groups to evaluate AF programs [24, 25, 89]. Finally, the set of 14 plant genomes is from Hatje et al [90].…”
Section: Methodsmentioning
confidence: 99%
“…The set of 29 E . coli genome sequences was originally compiled by Yin and Jin [23] and has been used in the past by other groups to evaluate AF programs [24, 25, 89]. Finally, the set of 14 plant genomes is from Hatje et al [90].…”
Section: Methodsmentioning
confidence: 99%
“…The sequences of 25 whole mitochondrial genomes of fish species from the suborder Labroidei and the species tree were taken from Fischer et al [48]. The set of 29 E. coli genome sequences was originally compiled by Yin and Jin [21] and has been used in the past by other groups to evaluate AF programs [22,23,68]. Finally, the set of 14 plant genomes is from Hatje et al [69].…”
Section: Data Setsmentioning
confidence: 99%
“…Because MSA is limited by the size of the genome, only the 29 mammalian mtDNA dataset is capable of using the tree from MSA as the benchmark tree. The benchmark tree for 29 Escherichia/Shigella is the tree studied by the research [28], [33] and 18 Drosophila genomes tree is from the phylogenetic tree database Open Tree of life [10], [34], [35]. In our implementation, we also used the USEARCH tool [36] to search for the alignment pair of any NGS short reads.…”
Section: Evaluation Metricmentioning
confidence: 99%