Simple Fast Algorithms for the Editing Distance between Trees and Related Problems

Abstract. An ordered tree is a tree in which each node's incident edges are cyclically ordered; think of the tree as being embedded in the plane. Let A and B be two ordered trees. The edit distance between A and B is the minimum cost of a sequence of operations (contract an edge, uncontract an edge, modify the label of an edge) needed to transform A into B. We give an O(n 3 log n) algorithm to compute the edit distance between two ordered trees.

show abstract

“…The proof consists in combining a slight modification of Lemma 7 of Zhang and Shasha [10] with our Lemma 2.…”

Section: Lemma 3 the Number Of Relevant Substrings Of T Is At Mostmentioning

confidence: 99%

Computing the Edit-Distance Between Unrooted Ordered Trees

Klein

1998

Algorithms — ESA’ 98

177

175

View full text Add to dashboard Cite

show abstract

“…The resulting algorithm has a complexity of O(|A||B|× depth(A) 2 × depth(B) 2 ) when finding the edit distance between two trees A and B (|A| and |B| denote tree cardinalities while depth(A) and depth(B) are the depths of the trees). Similarly, early approaches in [70,89] allow insertion, deletion and relabeling of nodes anywhere in the tree. Yet, they remain greedy in complexity.…”

Section: Early Approachesmentioning

confidence: 99%

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Tekli

Chbeir

2012

Journal of Web Semantics

View full text Add to dashboard Cite

“…Since XML documents can be represented as trees, it is a natural idea to utilize tree-to-tree correction techniques to detect changes in XML documents. Zhang and Shasha proposed a fast algorithm to find the minimum cost editing distance between two ordered labeled trees [9]. Given two or-dered trees T 1 and T 2 , in which each node has an associated label, their algorithm finds an optimal edit script in time O(|T 1 | × |T 2 | × min {depth(T 1 ), leaves(T 1 )} × min {depth(T 2 ), leaves(T 2 ) }), which is the best known result for the general tree-to-tree correction problem.…”

Section: Related Workmentioning

confidence: 99%

“…MH-Diff [5] provides an efficient heuristic solution based on transforming the problem to the edge cover problem, with a worst case cost in O(n 2 logn), where n is the total number of nodes. XMLTreeDiff [3] use DOMHash [7] and Zhang's algorithm [9]. Since the former conflicts with the later, this method may not generate an optimal result.…”

Section: Related Workmentioning

confidence: 99%

KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents

Wang

et al. 2002

On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE

View full text Add to dashboard Cite

Abstract. Most previous work in change detection on XML documents used the ordered tree, with the best complexity of O(nlogn), where n is the size of the document. The best algorithm we had ever known for unordered model achieves polynomial time in complexity. In this paper, we propose a highly efficient algorithm named KF-Diff+. The key property of our algorithm is that the algorithm transforms the traditional tree-to-tree correction into the comparing of the key trees which are substantially label trees without duplicate paths with the complexity of O(n), where n is the number of nodes in the trees. In addition, KF-Diff+ is tailored to both ordered trees and unordered trees. Experiment shows that KF-Diff+ can handle XML documents at extreme speed.

show abstract

Simple Fast Algorithms for the Editing Distance between Trees and Related Problems

Cited by 1,039 publications

References 10 publications

Computing the Edit-Distance Between Unrooted Ordered Trees

Computing the Edit-Distance Between Unrooted Ordered Trees

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents

Contact Info

Product

Resources

About