Investigating the Use of Machine Learning Algorithms in Detecting Gender of the Arabic Tweet Author

Alsukhni, Emad Mahmoud; Alequr, Qasem

doi:10.14569/ijacsa.2016.070746

Cited by 15 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a subsequent study, Deitrick et al [21] showed that exploiting feature selection methods improves the results substantially on n-gram features. Several studies [5,6,37,39,42,47,59,61] also presented models for author gender detection of writers of the social media in different languages.…”

Section: Related Workmentioning

confidence: 99%

A Comprehensive Study of Learning Approaches for Author Gender Identification

Dalyan

Ayral

Özdemir

2022

ITC

View full text Add to dashboard Cite

In recent years, author gender identification is an important yet challenging task in the fields of information retrieval and computational linguistics. In this paper, different learning approaches are presented to address the problem of author gender identification for Turkish articles. First, several classification algorithms are applied to the list of representations based on different paradigms: fixed-length vector representations such as Stylometric Features (SF), Bag-of-Words (BoW) and distributed word/document embeddings such as Word2vec, fastText and Doc2vec. Secondly, deep learning architectures, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), special kinds of RNN such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), C-RNN, Bidirectional LSTM (bi-LSTM), Bidirectional GRU (bi-GRU), Hierarchical Attention Networks and Multi-head Attention (MHA) are designated and their comparable performances are evaluated. We conducted a variety of experiments and achieved outstanding empirical results. To conclude, ML algorithms with BoW have promising results. fast-Text is also probably suitable between embedding models. This comprehensive study contributes to literature utilizing different learning approaches based on several ways of representations. It is also first important attempt to identify author gender applying SF, fastText and DNN architectures to the Turkish language.

show abstract

Section: Related Workmentioning

confidence: 99%

A Comprehensive Study of Learning Approaches for Author Gender Identification

Dalyan

Ayral

Özdemir

2022

ITC

View full text Add to dashboard Cite

show abstract

“…There are several challenges that hinder the development of tools for Twitter data analytics in the Arabic language, the greatest being the complexity of the language itself. Research on Twitter data analytics in Arabic has begun to appear in recent years in various application domains (detecting authors' genders [12], detecting traffic related events [18,20,38], finding restaurants' reputations [13]) but the progress has been slow. Moreover, some works are available in Modern Standard Arabic (MSA), but in general (not specific to healthcare), the works on Arabic dialects are very limited in number and scope [10,14].…”

Section: Research Gapmentioning

confidence: 99%

Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning

et al. 2020

View full text Add to dashboard Cite

Smartness, which underpins smart cities and societies, is defined by our ability to engage with our environments, analyze them, and make decisions, all in a timely manner. Healthcare is the prime candidate needing the transformative capability of this smartness. Social media could enable a ubiquitous and continuous engagement between healthcare stakeholders, leading to better public health. Current works are limited in their scope, functionality, and scalability. This paper proposes Sehaa, a big data analytics tool for healthcare in the Kingdom of Saudi Arabia (KSA) using Twitter data in Arabic. Sehaa uses Naive Bayes, Logistic Regression, and multiple feature extraction methods to detect various diseases in the KSA. Sehaa found that the top five diseases in Saudi Arabia in terms of the actual afflicted cases are dermal diseases, heart diseases, hypertension, cancer, and diabetes. Riyadh and Jeddah need to do more in creating awareness about the top diseases. Taif is the healthiest city in the KSA in terms of the detected diseases and awareness activities. Sehaa is developed over Apache Spark allowing true scalability. The dataset used comprises 18.9 million tweets collected from November 2018 to September 2019. The results are evaluated using well-known numerical criteria (Accuracy and F1-Score) and are validated against externally available statistics.

show abstract

“…Subsequently, the authors in [2] extend their work by experimenting with different machine learning algorithms, data subsets and feature selection methods, reporting accuracies up to 94%. The authors in [1] manually annotate tweets from Jordanian dialects with gender information. They show how the name of the author of the tweet can significantly improve the performance.…”

Section: Age Gender and Language Variety Identification In Arabicmentioning

confidence: 99%

Author Profiling Tracks at FIRE

Rosso

Rangel

2020

SN COMPUT. SCI.

View full text Add to dashboard Cite

Benchmarking activities are vital for fostering research and addressing new challenging problems. During the last 10 years of the FIRE initiative, we have been involved in the organization of more than ten tracks, with the aim of the creation of new resources in several languages that were made available to the research community. This allowed to compare the new several approaches on the same datasets. In this chapter, we will focus on the description of three author profiling tracks, on their data creation as well as the result analysis.

show abstract

Investigating the Use of Machine Learning Algorithms in Detecting Gender of the Arabic Tweet Author

Cited by 15 publications

References 7 publications

A Comprehensive Study of Learning Approaches for Author Gender Identification

A Comprehensive Study of Learning Approaches for Author Gender Identification

Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning

Author Profiling Tracks at FIRE

Contact Info

Product

Resources

About