Using structural similarity for clustering XML documents

Aïtelhadj, Ali; Boughanem, Mohand; Mezghiche, Mohamed; Souam, Fatiha

doi:10.1007/s10115-011-0421-5

Cited by 17 publications

(5 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, clustering represents merging of similar types of XML data & applications of XML clustering are: information retrieval, data integration, document ranking, web mining as well as query processing. The major issues in XML data preprocessing for ranking are given below [2] :…”

Section: Proposed Modelmentioning

confidence: 99%

See 1 more Smart Citation

An novel cluster based feature selection and document classification model on high dimension trec data

Kumari¹,

Satyanarayana²

2017

IJET

View full text Add to dashboard Cite

TREC text documents are complex to analyze the features its relevant similar documents using the traditional document similarity measures. As the size of the TREC repository is increasing, finding relevant clustered documents from a large collection of unstructured documents is a challenging task. Traditional document similarity and classification models are implemented on homogeneous TREC data to find essential features for document entities that are similar to the TREC documents. Also, most of the traditional models are applicable to limited text document sets for text analysis. The main issues in the traditional text mining models in TREC repository include :1) Each document is represented in vector form with many sparsity values 2) Failed to find the document semantic similarity between the intra and inter clusters 3) High mean squared error rate. In this paper, novel feature selection based clustered and classification model is proposed on large number of different TREC repositories. Traditional latent Semantic Indexing and document clustering models are failed to find the topic relevance on large number of TREC clinical text document sets due to computational memory and time. Proposed document feature selection and clustered based classification model is applied on TREC clinical benchmark datasets. From the experimental results, it is proved that the proposed model is efficient than the existing models in terms of computational memory, accuracy and error rate are concerned.

show abstract

Section: Proposed Modelmentioning

confidence: 99%

“…The identification of a new TREC documents comes along with two vital tasks. The first task is the problem of identification of features in the TREC training data [2][3][4]. The second task is called the feature based document clustering and classification.…”

Section: Introductionmentioning

confidence: 99%

An novel cluster based feature selection and document classification model on high dimension trec data

Kumari¹,

Satyanarayana²

2017

IJET

View full text Add to dashboard Cite

show abstract

“…As X3D expresses the geometry and behaviour capabilities of VRML using XML [25] which has become an unchallenged standard for the representation and exchange of data on the web [26], X3D documents must also follow the rules of writing used in XML. Nevertheless, X3D documents can be written using a writing style as used in classic VRML encoding [27], so developers who are more familiar with the VRML writing style can choose this way.…”

Section: Web3d Standardsmentioning

confidence: 99%

A Study on the Conversion of VRML to X3D In A Highly Complex and Detailed Web3D World

2017

IJCSI

View full text Add to dashboard Cite

X3D has been used by Web3D world developers around the world, some of them developed their world from the scratch, not as a converted version from VRML. Although VRML document can be converted to X3D version, developers tend to choose to create a new world directly using X3D as it will produce clean documents. This choice can be tough to be taken when the objective is to create a highly complex Web3D world which is constructed almost entirely by polygons, and the bitmap images are used only as complements, used as the skin for 3D objects. Another choice was converting the existing VRML version of the site to its X3D version. The only remaining problem is that whether the converted version will have the same or even better performance than the original one. This paper discussed the initial steps to cut the development time of a Web3D world by converting the VRML to X3D version. Results from the comparison have shown that mostly converted parts from the target world have similar looks and behaviour close to the original parts. A slight increase in performance numbers were noted, no significant differences were found, with only few inconsistencies. Therefore, a full conversion from VRML to X3D is recommended for the site to be executed.

show abstract

“…An approach similar to S-GRACE was presented by Aïtelhadj et al (2012). The authors propose to transform XML documents into tree summaries by merging all repeating elements at each level of a document into a single node.…”

Section: Substructural Similarity Approachesmentioning

confidence: 99%

XML clustering: a review of structural approaches

Piernik

Brzeziński

Morzy

et al. 2014

The Knowledge Engineering Review

View full text Add to dashboard Cite

With its presence in data integration, chemistry, biological, and geographic systems, eXtensible Markup Language (XML) has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents—an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. In addition, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.

show abstract

Using structural similarity for clustering XML documents

Cited by 17 publications

References 37 publications

An novel cluster based feature selection and document classification model on high dimension trec data

An novel cluster based feature selection and document classification model on high dimension trec data

A Study on the Conversion of VRML to X3D In A Highly Complex and Detailed Web3D World

XML clustering: a review of structural approaches

Contact Info

Product

Resources

About