Author Identification in Albanian Language

Paci, Hakik; Kajo, Elinda; Trandafili, Evis; Tafa, Igli; Salillari, Denisa

doi:10.1109/nbis.2011.71

Cited by 8 publications

(6 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While many studies rely on established benchmark datasets like Enron [20], C50 [7], PAN [22], IMDb62 [6,21] and others [9], the scarcity of standard datasets, particularly for low-resource languages, presents a unique challenge. Creating specialized corpora has paved the way for promising advancements in the field, demonstrated by projects like UNAAC [5], BAAD [2], UrduCorpus [5], A3C Corpus [8,25], and more [4]. These corpora tailored for AA contribute significantly to the field, expanding its resources.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Authorship classification techniques: Bridging textual domains and languages

Misini,

Kadriu,

Canhasi

2024

IJITS JOURNAL

View full text Add to dashboard Cite

Authorship classification analyzes an author's prior work to identify their writing style, a unique trait of each language and individual author. This research aims to conduct a thorough comparative analysis of various methods for classifying authorship. The study leverages two corpora: AAALitCorpus of Albanian literary texts and CCAT10 of English columns. We evaluate model-generated features across different configurations. The richness of the features and the breadth of the analysis provide a significant understanding of the problem, setting a new standard for comprehensive linguistic investigations across multiple languages. The study indicates that machine learning algorithms accurately discern authorial writing styles, highlighting the complexities of classifying authorship in a cross-linguistic context.

show abstract

Section: Related Workmentioning

confidence: 99%

“…In today's digital age [1], the Internet has expanded anonymous content, making AA an increasingly concern. This issue carries substantial implications across various domains, including literature [2][3][4], journalism [5][6][7][8], and forensics [9].…”

Section: Introductionmentioning

confidence: 99%

Authorship classification techniques: Bridging textual domains and languages

Misini,

Kadriu,

Canhasi

2024

IJITS JOURNAL

View full text Add to dashboard Cite

show abstract

“…An important part of the literature consists of studies on English language [4,5,6,7,8]. There are also many studies done in many different languages including Japanese [9], Mongolian [10], Persian [11], Albanian [12], Indian [13,14], Brazilian [15], Russian [16,17], German [18], and Arabic [19]. When the existing studies were examined, it was seen that different types of data sets were used for author identification tasks.…”

Section: Literature Reviewmentioning

confidence: 99%

“…When the existing studies were examined, it was seen that different types of data sets were used for author identification tasks. Some studies have been carried out on newspaper articles [4,15,18,19], while others were carried out on poems [13], novels [11,12,16], email content [20], song lyrics [21], source codes [22], or tweets, blog posts, and forums [8,9,23]. In some cases, different types of data sources were combined or compared [17,25] Early studies in author identification focused on different stylometric techniques.…”

Section: Literature Reviewmentioning

confidence: 99%

Author Identification with Machine Learning Algorithms

YÜLÜCE¹,

Dalkılıç²

2022

IJMSIT

View full text Add to dashboard Cite

Author identification is one of the application areas of text mining. It deals with the automatic prediction of the potential author of an electronic text among predefined author candidates by using author specific writing styles. In this study, we conducted an experiment for the identification of the author of a Turkish language text by using classical machine learning methods including Support Vector Machines (SVM), Gaussian Naive Bayes (GaussianNB), Multi Layer Perceptron (MLP), Logistic Regression (LR), Stochastic Gradient Descent (SGD) and ensemble learning methods including Extremely Randomized Trees (ExtraTrees), and eXtreme Gradient Boosting (XGBoost). The proposed method was applied on three different sizes of author groups including 10, 15 and 20 authors obtained from a new dataset of newspaper articles. Term frequency-inverse document frequency (TF-IDF) vectors were created by using 1-gram and 2-gram word tokens. Our results show that the most successful method is the SGD with a classification performance accuracy of 0.976% by using word unigrams and most successful method is the LR with a classification performance accuracy of 0.935% by using word bigrams.

show abstract

“…The initial study aimed to identify authors of literary texts using stylometric techniques (Varela et al, 2016). Author attribution isn't just a literary problem (Phani et al, 2017), (Zhou et al, 2022), (Paci et al, 2011) The remaining sections of the paper are structured as follows. In Section 2, we take a look at the author-related tasks.…”

Section: Introductionmentioning

confidence: 99%

A Survey on Authorship Analysis Tasks and Techniques

2022

View full text Add to dashboard Cite

Authorship Analysis (AA) is a natural language processing field that examines the previous works of writers to identify the author of a text based on its features. Studies in authorship analysis include authorship identification, authorship profiling, and authorship verification. Due to its relevance, to many applications in this field attention has been paid. It is widely used in the attribution of historical literature. Other applications include legal linguistics, criminal law, forensic investigations, and computer forensics. This paper aims to provide an overview of the work done and the techniques applied in the authorship analysis domain. The examination of recent developments in this field is the principal focus. Many different criteria can be used to define a writer’s style. This paper investigates stylometric features in different author-related tasks, including lexical, syntactic, semantic, structural, and content-specific ones. A lot of classification methods have been applied to authorship analysis tasks. We examine many research studies that use different machine learning and deep learning techniques. As a means of pointing the direction for future studies, we present the most relevant methods recently proposed. The reviewed studies include documents of different types and different languages. In summary, due to the fact that each natural language has its own set of features, there is no standard technique generically applicable for solving the AA problem.

show abstract

Author Identification in Albanian Language

Cited by 8 publications

References 6 publications

Authorship classification techniques: Bridging textual domains and languages

Authorship classification techniques: Bridging textual domains and languages

Author Identification with Machine Learning Algorithms

A Survey on Authorship Analysis Tasks and Techniques

Contact Info

Product

Resources

About