In this paper, we propose another topological approach for DNA similarity analysis. For each DNA sequence, we transform it into a collection of vectors in 5-dimensional space in which all nucleotides of the same type, say A, C, G, T are on the same line in this 5D space. Based on this special geometric property, we combine this representation with tools in persistent homology to obtain only zeroth persistence diagrams as a topological representation of DNA sequences. Similarities between DNA sequences are signified via how close the representing zeroth persistence diagrams of the DNA sequences are, based on the Wasserstein distance of order zero, which provides a new method for analyzing similarities between DNA sequences. We test our methods on the datasets of Human rhinovirus (HRV) and Influenza A virus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.