2009 IEEE International Conference on Intelligent Computing and Intelligent Systems 2009
DOI: 10.1109/icicisys.2009.5357676
|View full text |Cite
|
Sign up to set email alerts
|

Two-layer classification and distinguished representations of users and documents for grouping and authorship identification

Abstract: Abstract-Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…Many steps of preprocessing wrere done such as: removes images, videos, elicit text in tables, and delete empty posts, all that to produce group from 63,167 posts that represent purely dataset for this work. The classification percentageis high blooger for larger number posts is greater than 90% [11].…”
Section: Related Workmentioning
confidence: 91%
See 1 more Smart Citation
“…Many steps of preprocessing wrere done such as: removes images, videos, elicit text in tables, and delete empty posts, all that to produce group from 63,167 posts that represent purely dataset for this work. The classification percentageis high blooger for larger number posts is greater than 90% [11].…”
Section: Related Workmentioning
confidence: 91%
“…Mohtasseb and Ahmed identify the author (person) by text collection, features extraction by converting very blog post to a features vector and finally applied support vector machine (SVM) as the classification algorithm [11]. They used text that collected from LiveJournal 80,000 blog posts that contained 565 authors(person) with 140 posts as a rate for any user.…”
Section: Related Workmentioning
confidence: 99%
“…As a result, the author identification of Chinese texts has the following problems: first, the feature modeling of Chinese texts mostly relies on manual work, and different modeling is required according to the different characteristics of different corpora; second, the accuracy of classification largely depends on the size of corpus. Third, when the number of authors is large, the accuracy rate decreases significantly [ 7 ].…”
Section: Introductionmentioning
confidence: 99%