2011
DOI: 10.1145/1978802.1978804
|View full text |Cite
|
Sign up to set email alerts
|

XML data clustering

Abstract: In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(46 citation statements)
references
References 89 publications
0
46
0
Order By: Relevance
“…In practice, a cost may be assigned to each individual operation to reflect its importance. Typical tree distance algorithms include [11] and [12]. Flesca et al [13] represent XML documents as time series and compute the structural similarity between two documents by exploiting Discrete Fourier Transform of the corresponding signals.…”
Section: Similarity Measurement Of Xml Documentsmentioning
confidence: 99%
“…In practice, a cost may be assigned to each individual operation to reflect its importance. Typical tree distance algorithms include [11] and [12]. Flesca et al [13] represent XML documents as time series and compute the structural similarity between two documents by exploiting Discrete Fourier Transform of the corresponding signals.…”
Section: Similarity Measurement Of Xml Documentsmentioning
confidence: 99%
“…Let us now focus on the main differences between the steps of XPattern and the earlier mentioned typical clustering methodology [2]. Although the first step of both methodologies transforms objects into a chosen representation, in the traditional approach this is done to compare documents with each other, while in our approach it facilitates the process of pattern mining.…”
Section: Step 4: Document Assignmentmentioning
confidence: 99%
“…The requirement of knowing the number of clusters a priori is commonly assumed in most XML clustering algorithms. Recent XML clustering surveys [2,32] reveal that nearly all of the approaches proposed so far rely on this assumption. However, such a requirement may discredit the algorithm in many real-world applications.…”
Section: Parametrizationmentioning
confidence: 99%
“…Furthermore, XML significantly influences data management [23,24] (e.g., interpolation and prediction of spatiotemporal data) because the data can remain in a tree structure and a node or subtree can be considered metadata. Thus, interpolation and prediction of spatiotemporal data based on XML seems to have greater performance advantages than a traditional database.…”
Section: Introductionmentioning
confidence: 99%