2004
DOI: 10.1016/s0306-4379(03)00031-0
|View full text |Cite
|
Sign up to set email alerts
|

A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Abstract: In this paper we propose a matching algorithm for measuring the structural similarity between an XML document and a DTD. The matching algorithm, by comparing the document structure against the one the DTD requires, is able to identify commonalities and differences. Differences can be due to the presence of extra elements with respect to those the DTD requires and to the absence of required elements. The evaluation of commonalities and differences gives raise to a numerical rank of the structural similarity. Mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
95
0

Year Published

2005
2005
2012
2012

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 80 publications
(95 citation statements)
references
References 23 publications
0
95
0
Order By: Relevance
“… D-factor underlines the semantic influence of node depth on XML semantic similarity. It follows the intuition that information placed near the root node of an XML document is more important than information further down in the hierarchy [6,90]. Thus, node labels higher in the XML tree hierarchy should have a greater semantic influence than their lower counterparts.…”
Section: Semantic Resemblance Between Sub-trees (Sem-rbs)mentioning
confidence: 75%
See 2 more Smart Citations
“… D-factor underlines the semantic influence of node depth on XML semantic similarity. It follows the intuition that information placed near the root node of an XML document is more important than information further down in the hierarchy [6,90]. Thus, node labels higher in the XML tree hierarchy should have a greater semantic influence than their lower counterparts.…”
Section: Semantic Resemblance Between Sub-trees (Sem-rbs)mentioning
confidence: 75%
“…In general, element/attribute values are disregarded when evaluating the structural properties of heterogeneous XML documents (originating from different data-sources and not conforming to the same grammar), so as to perform XML structural classification/clustering [16,31,55,58] or structural querying (i.e., querying the structure of documents, disregarding content [6,64]). Nonetheless, values are usually taken into account with methods dedicated to XML change management [13,14], data integration [29,40], and XML structure-and-content querying applications [66,67], where documents tend to have similar structures (probably conforming to the same grammar [36,83]).…”
Section: Figmentioning
confidence: 99%
See 1 more Smart Citation
“…In case of similarity of a document D and a schema S there are also two types of strategies -techniques which measure the number of elements which appear in D but not in S and vice versa (e.g. [1]) and techniques which measure the closest distance between D and "all" documents valid against S (e.g. [8]).…”
Section: Related Workmentioning
confidence: 99%
“…Our embedding relation closely resembles the notion of simulation (for the formal definition, see [2]), which has been widely used in a number of works about querying, transformation, and verification of semistructured data (cf. [6,1,15,5] It is important to have an efficient implementation of homeomorphic embedding because it is used repeatedly during the verification process as described in the following.…”
Section: Rule-based Web Site Verificationmentioning
confidence: 99%