2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA) 2014
DOI: 10.1109/aiccsa.2014.7073254
|View full text |Cite
|
Sign up to set email alerts
|

An extensive study of the Bag-of-Words approach for gender identification of Arabic articles

Abstract: The prevalent use of Online Social Networks (OSN) and the anonymity and lack of accountability they inherent from being online give rise to many problems related to finding the connection between the massive amount of text data on OSN and the people who actually wrote them. Analyzing text data for such purposes is called authorship analysis. This work is focused on one specific type of authorship analysis, which is identifying the author's gender. Gender identification has various applications from marketing t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
2
1

Relationship

3
6

Authors

Journals

citations
Cited by 31 publications
(21 citation statements)
references
References 30 publications
0
20
0
1
Order By: Relevance
“…Moreover, the recent trend of web users to contribute their thoughts and ideas using unstructured and sometimes poorly written comments increases the challenge faced by such techniques. This is also gave rise to many applications of TC such as spam filtering [1], sentiment analysis [2], [3], [4], [5], determining author's characteristics such as identity [6], [7], [8], gender [9], [10], dialect [11], [12], native language [13], political orientation [14], [15], etc. So, several pre-processing steps are sometimes required to transform this data into a form that TC techniques can handle [16].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the recent trend of web users to contribute their thoughts and ideas using unstructured and sometimes poorly written comments increases the challenge faced by such techniques. This is also gave rise to many applications of TC such as spam filtering [1], sentiment analysis [2], [3], [4], [5], determining author's characteristics such as identity [6], [7], [8], gender [9], [10], dialect [11], [12], native language [13], political orientation [14], [15], etc. So, several pre-processing steps are sometimes required to transform this data into a form that TC techniques can handle [16].…”
Section: Introductionmentioning
confidence: 99%
“…It is one of the fundamental problems in many fields such as text mining, machine learning, natural language processing, information retrieval, etc., with a vast range of applications such as spam filtering [1], sentiment analysis [2], [3], [4], [5], determining author's characteristics such as identity [6], [7], [8], gender [9], [10], dialect [11], [12], native language [13], political orientation [14], [15], etc. The TC problem gained more importance due to the explosion in the size of text data available on the Web over the past two decades.…”
Section: Introductionmentioning
confidence: 99%
“…Alsmearat et al in [2] investigated gender identification on Arabic articles using the Bag Of Words (BOW) feature in the selection phase. The proposed technique works by estimating each word frequency in each document.…”
Section: B Gender Detection On Arabic Languagementioning
confidence: 99%
“…In this study, we focused on Arabic opinions Twitter. Some of these studies have been investigated only gender aspect as a core attribute which can be a good indicator of the author of Tweet as in [1,2]. Other studies investigated not only gender but also other attributes such as age for example in [3,4].…”
Section: Introductionmentioning
confidence: 99%