BackgroundMeasuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees.ResultsIn this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search.ConclusionsThe proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request.
Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the treestructured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees.This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up.
This paper presents a fixed-parameter algorithm for the tree edit distance problem for unordered trees under the unit cost model that works in O(2.62 k • poly(n)) time and O(n 2) space, where the parameter k is the maximum bound of the edit distance and n is the maximum size of input trees. This paper also presents polynomial time algorithms for the case where the maximum degree of the largest common subtree is bounded by a constant.
a b s t r a c tTraffic congestion occurs frequently in urban settings, and is not always caused by traffic incidents. In this paper, we propose a simple method for detecting traffic incidents from probe-car data by identifying unusual events that distinguish incidents from spontaneous congestion. First, we introduce a traffic state model based on a probabilistic topic model to describe the traffic states for a variety of roads. Formulas for estimating the model parameters are derived, so that the model of usual traffic can be learned using an expectation-maximization algorithm. Next, we propose several divergence functions to evaluate differences between the current and usual traffic states and streaming algorithms that detect high-divergence segments in real time. We conducted an experiment with data collected for the entire Shuto Expressway system in Tokyo during 2010 and 2011. The results showed that our method discriminates successfully between anomalous car trajectories and the more usual, slowly moving traffic patterns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.