2017 IEEE Trustcom/BigDataSE/Icess 2017
DOI: 10.1109/trustcom/bigdatase/icess.2017.261
|View full text |Cite
|
Sign up to set email alerts
|

Sequential and Unsupervised Document Authorial Clustering Based on Hidden Markov Model

Abstract: Document clustering groups documents of certain similar characteristics in one cluster. Document clustering has shown advantages on organization, retrieval, navigation and summarization of a huge amount of text documents on Internet. This paper presents a novel, unsupervised approach for clustering single-author documents into groups based on authorship. The key novelty is that we propose to extract contextual correlations to depict the writing style hidden among sentences of each document for clustering the d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…García-Mondeja et al used a Bag-of-Words language model with binary features, because the documents were too short to use frequencies, and then applied a β-compact algorithm that placed documents in the same cluster if they were maximally similar and more proximal than the threshold β [9]. Aldebei et al proposed a two-level HMM to model relations of sequential sentences in order to cluster single-author documents by authorship [2]. While the approach was for long documents, they also tested it on smaller segments of texts as well [2].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…García-Mondeja et al used a Bag-of-Words language model with binary features, because the documents were too short to use frequencies, and then applied a β-compact algorithm that placed documents in the same cluster if they were maximally similar and more proximal than the threshold β [9]. Aldebei et al proposed a two-level HMM to model relations of sequential sentences in order to cluster single-author documents by authorship [2]. While the approach was for long documents, they also tested it on smaller segments of texts as well [2].…”
Section: Related Workmentioning
confidence: 99%
“…Aldebei et al proposed a two-level HMM to model relations of sequential sentences in order to cluster single-author documents by authorship [2]. While the approach was for long documents, they also tested it on smaller segments of texts as well [2]. Gómez-Adorno et al used a hierarchical agglomerative clustering algorithm with average linkage and cosine similarity, and with the Calinski-Harabasz optimization criterion to determine the cut-off layer of the dendrogram [10].…”
Section: Related Workmentioning
confidence: 99%