2005
DOI: 10.1007/11547273_14
|View full text |Cite
|
Sign up to set email alerts
|

Approximate Subtree Identification in Heterogeneous XML Documents Collections

Abstract: Abstract. Due to the heterogeneous nature of XML data for internet applications exact matching of queries is often inadequate. The need arises to quickly identify subtrees of XML documents in a collection that are similar to a given pattern. In this paper we discuss different similarity measures between a pattern and subtrees of documents in the collection. An efficient algorithm for the identification of document subtrees, approximately conforming to the pattern, by indexing structures is then introduced.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(10 citation statements)
references
References 9 publications
0
10
0
Order By: Relevance
“…In general, element/attribute values are disregarded when evaluating the structural properties of heterogeneous XML documents (originating from different data-sources and not conforming to the same grammar), so as to perform XML structural classification/clustering [16,31,55,58] or structural querying (i.e., querying the structure of documents, disregarding content [6,64]). Nonetheless, values are usually taken into account with methods dedicated to XML change management [13,14], data integration [29,40], and XML structure-and-content querying applications [66,67], where documents tend to have similar structures (probably conforming to the same grammar [36,83]).…”
Section: Figmentioning
confidence: 99%
See 1 more Smart Citation
“…In general, element/attribute values are disregarded when evaluating the structural properties of heterogeneous XML documents (originating from different data-sources and not conforming to the same grammar), so as to perform XML structural classification/clustering [16,31,55,58] or structural querying (i.e., querying the structure of documents, disregarding content [6,64]). Nonetheless, values are usually taken into account with methods dedicated to XML change management [13,14], data integration [29,40], and XML structure-and-content querying applications [66,67], where documents tend to have similar structures (probably conforming to the same grammar [36,83]).…”
Section: Figmentioning
confidence: 99%
“…Recent XML structure-based methods in [6,64] identify the need to support tag similarity (synonyms and stems) instead of tag syntactic equality while comparing XML documents. In [42], the authors introduce a structure and content based method for comparing XML documents having the same grammar (i.e., not heterogeneous), and consider semantic similarity evaluation between element/attribute values, using a variation of the edge-based methods.…”
Section: Integrating Structural and Semantic Similaritymentioning
confidence: 99%
“…In addition, recent techniques related to performance enhancement in XML document similarity (such as Entropy [42] and Structural Pattern Indexes [73]) and XML grammar similarity (such as Prufer sequence encoding [4] and B-Tree indexing [32]), could be investigated (and possibly adapted or combined) to improve the performance levels of XML document/grammar comparison solutions.…”
Section: Xml Comparison Efficiencymentioning
confidence: 99%
“…The contribution of the paper can thus be summarized as follows: (i) specification of an approach for the efficient identification of regions by specifically tailored indexing structures; (ii) characterization of different similarity measures between a pattern and regions in a collection of heterogeneous semi-structured data; (iii) realization of a prototype and experimental validation of the approach. The paper is an extended version of [7]. With respect to [7], the presentation of the approach has been deeply revised, all the underlying concepts are clearly and formally defined, and the corresponding algorithms, only sketched in [7], are detailed and their correctness and complexity are discussed.…”
Section: Introductionmentioning
confidence: 99%
“…With respect to [7], the presentation of the approach has been deeply revised, all the underlying concepts are clearly and formally defined, and the corresponding algorithms, only sketched in [7], are detailed and their correctness and complexity are discussed. Moreover, the experimental evaluation considerably extends the preliminary results presented in [7].…”
Section: Introductionmentioning
confidence: 99%