Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.
Graphical representation of DNA sequences is one of the most popular techniques of alignment-free sequence comparison. In this article, we propose a new method for extracting features of DNA sequences represented by binary images, in which we estimate the similarity between DNA sequences by the frequency histograms of local bitmap patterns on the images. Our method has linear time complexity for the length of DNA sequences, which is practical even for comparison of long sequences. We tested five distance measures to estimate sequence similarities and found that histogram intersection and Manhattan distance are most appropriate for our method among them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.