2014
DOI: 10.1017/s0269888914000216
|View full text |Cite
|
Sign up to set email alerts
|

XML clustering: a review of structural approaches

Abstract: With its presence in data integration, chemistry, biological, and geographic systems, eXtensible Markup Language (XML) has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents—an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 85 publications
0
7
0
Order By: Relevance
“…In future works, we plan to apply our measure to di erent problems such as data integration [6] or document clustering as in [12]. Finally, we will work on formalizing the codi cation of the code tables to be included as metadata describing the di erent datasets.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In future works, we plan to apply our measure to di erent problems such as data integration [6] or document clustering as in [12]. Finally, we will work on formalizing the codi cation of the code tables to be included as metadata describing the di erent datasets.…”
Section: Discussionmentioning
confidence: 99%
“…Apart from adapting the algorithm and their measure to RDF graphs, we have shown the capacities of this approach regarding exible data models. In another domain, the notion of structural similarity appears in approaches for XML document clustering [12]. ese approaches use di erent ways to evaluate distances between XML documents, and use these distances in clustering algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…The requirement of knowing the number of clusters a priori is commonly assumed in most XML clustering algorithms. Recent XML clustering surveys [2,32] reveal that nearly all of the approaches proposed so far rely on this assumption. However, such a requirement may discredit the algorithm in many real-world applications.…”
Section: Parametrizationmentioning
confidence: 99%
“…As discussed in the overview paper by Piernik et. al., most clustering applications involve three phases [57]. First, a preprocessing stage occurs to transform the text-based XML documents into more computationally compatible formats.…”
Section: Related Work Involving Xmlmentioning
confidence: 99%
“…Second, a similarity calculation can be developed as a way of measuring how related two or more XML documents are to each other. Finally, a clustering algorithm is applied that uses this similarity calculation to group related documents [57].…”
Section: Related Work Involving Xmlmentioning
confidence: 99%