With the growth of the amount of genomic data generated from high-throughput sequencing, nextgeneration sequencing (NGS) has become the mainstream format for genome sequence data. NGS presents new challenges for many applications for genome sequence analysis. In sequence comparison applications, traditional multiple-sequence alignment approaches do not provide a solution for analyzing NGS data because of the short-read assembly and computational resource problems. Thus, alignment-free methods are more suitable for NGS data comparisons. Most of the alignment-free methods are based on the k-mer algorithm. However, the characteristics of NGS data make such k-mer-based methods suboptimal because the k parameter is a crucial factor in distance measurement and for the construction of phylogenetic trees. We propose an effective parameter-free comparison of NGS short reads, with the aim of eliminating the dependency on the k parameters. We compared the proposed method with existing methods, and the results show that the proposed method can measure accurate distances for the dataset without requiring any parameter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.