XML data clustering

Algergawy, Alsayed; Mesiti, Marco; Nayak, Richi; Saake, Gunter

doi:10.1145/1978802.1978804

Cited by 56 publications

(46 citation statements)

References 89 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In practice, a cost may be assigned to each individual operation to reflect its importance. Typical tree distance algorithms include [11] and [12]. Flesca et al [13] represent XML documents as time series and compute the structural similarity between two documents by exploiting Discrete Fourier Transform of the corresponding signals.…”

Section: Similarity Measurement Of Xml Documentsmentioning

confidence: 99%

A Hybrid Method to Evaluate Similarity of XML Document

Dai¹,

Ren²

2016

Advances in Computer Science Research

View full text Add to dashboard Cite

Abstract-XML is an important standard of information representation and data exchange over the Internet, document classification is an important way to get useful information from the mass of information solutions, a method of XML document classification is proposed based on fuzzy matching path in the paper. First, the information that has no influence on the classification is removed; Then a mixed method is used to compute XML document similarity, XML document is expressed as a collection of path, deleting the recurring and matching fuzzy path in order to improve efficiency, Hungarian algorithm to calculate the similarity between documents; Finally, 2 experiments are done and the results show that the method is effective.

show abstract

Section: Similarity Measurement Of Xml Documentsmentioning

confidence: 99%

A Hybrid Method to Evaluate Similarity of XML Document

Dai¹,

Ren²

2016

Advances in Computer Science Research

View full text Add to dashboard Cite

show abstract

“…Let us now focus on the main differences between the steps of XPattern and the earlier mentioned typical clustering methodology [2]. Although the first step of both methodologies transforms objects into a chosen representation, in the traditional approach this is done to compare documents with each other, while in our approach it facilitates the process of pattern mining.…”

Section: Step 4: Document Assignmentmentioning

confidence: 99%

“…The requirement of knowing the number of clusters a priori is commonly assumed in most XML clustering algorithms. Recent XML clustering surveys [2,32] reveal that nearly all of the approaches proposed so far rely on this assumption. However, such a requirement may discredit the algorithm in many real-world applications.…”

Section: Parametrizationmentioning

confidence: 99%

Clustering XML documents by patterns

2015

View full text Add to dashboard Cite

Now that the use of XML is prevalent, methods for mining semi-structured documents have become even more important. In particular, one of the areas that could greatly benefit from in-depth analysis of XML's semi-structured nature is cluster analysis. Most of the XML clustering approaches developed so far employ pairwise similarity measures. In this paper, we study clustering algorithms, which use patterns to cluster documents without the need for pairwise comparisons. We investigate the shortcomings of existing approaches and establish a new pattern-based clustering framework called XPattern, which tries to address these shortcomings. The proposed framework consists of four steps: choosing a pattern definition, pattern mining, pattern clustering, and document assignment. The framework's distinguishing feature is the combination of pattern clustering and document-cluster assignment, which allows to group documents according to their characteristic features rather than their direct similarity. We experimentally evaluate the proposed approach by implementing an algorithm called PathXP, which mines maximal frequent paths and groups them into profiles. PathXP was found to match, in terms of accuracy, other XML clustering approaches, while requiring less parametrization and providing easily interpretable cluster representatives. Additionally, the results of an in-depth experimental study lead to general suggestions concerning pattern-based XML clustering.

show abstract

“…Furthermore, XML significantly influences data management [23,24] (e.g., interpolation and prediction of spatiotemporal data) because the data can remain in a tree structure and a node or subtree can be considered metadata. Thus, interpolation and prediction of spatiotemporal data based on XML seems to have greater performance advantages than a traditional database.…”

Section: Introductionmentioning

confidence: 99%

Interpolation and Prediction of Spatiotemporal Data Based on XML Integrated with Grey Dynamic Model

Bai

2017

IJGI

View full text Add to dashboard Cite

Abstract:Interpolation and prediction of spatiotemporal data are integral components of many real-world applications. Thus, approaches of interpolating and predicting spatiotemporal data have been extensively investigated. Currently, the grey dynamic model has been used to enhance the performance of interpolating and predicting spatiotemporal data. Meanwhile, the Extensible Markup Language (XML) has unique characteristics of information representation and exchange. In this paper, we first couple the grey dynamic model with the spatiotemporal XML model. Based on a definition of the position part of the spatiotemporal XML model, we extract the corresponding position information of each time interval and propose an algorithm for constructing an AVL tree to store them. Then, we present the architecture of an interpolating and predicting process and investigate change operations in positions. On this basis, we present an algorithm for interpolation and prediction of spatiotemporal data based on XML integrated with the grey dynamic model. Experimental results demonstrate the performance advantages of the proposed approach.

show abstract

XML data clustering

Cited by 56 publications

References 89 publications

A Hybrid Method to Evaluate Similarity of XML Document

A Hybrid Method to Evaluate Similarity of XML Document

Clustering XML documents by patterns

Interpolation and Prediction of Spatiotemporal Data Based on XML Integrated with Grey Dynamic Model

Contact Info

Product

Resources

About