Lexicon-free stemming for Kazakh language information retrieval

Tukeyev, Ualsher; Turganbayeva, Aliya; Abduali, Balzhan; Rakhimova, Diana; Amirova, Dina; Karibayeva, Aidana

doi:10.1109/icaict.2018.8747021

Cited by 7 publications

(2 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Misalnya, kata " cars, car's, car " memiliki bentuk dasar yang sama yaitu "car". Permasalahan utama dalam proses stemming adalah bagaimana cara memperoleh kata dasar yang benar dari suatu kata yang telah mengalami perubahan bentuk [2], [3].…”

Section: Pendahuluanunclassified

Analisa Modifikasi Algoritma Stemming Untuk Kasus Overstemming

Hersianie

2020

teknokom

View full text Add to dashboard Cite

Overstemming merupakan pemenggalan kata ke bentuk asal (root word) yang berlebihan. Hal ini menyebabkan kata tersebut bermakna sangat berbeda dengan kata asal. Namun, stem yang dihasilkan sama bentuknya. Untuk mengatasi permasalahan tersebut, penelitian sebelumnya telah menerapkan algoritma stemming dengan tabel aturan kata. Namun kekurangan dari tabel aturan kata ini adalah kesulitan dalam menambahkan jenis kata yang mengalami overstemming. Oleh karena itu, penelitian ini bertujuan untuk memodifikasi algoritma overstemming tersebut. Penelitian ini akan menggabungkan algoritma stemming (hybrid stemming) yaitu algoritma look-up table, tabel aturan kata dan algoritma stemming Porter yang biasa digunakan. Dataset yang digunakan dalam pengujian adalah atribut judul pada dokumen publikasi ilmiah. Hasil pengujian menunjukkan bahwa modifikasi algoritma stemming menghasilkan recall sebesar 89, 9%.Saran untuk penelitian selanjutnya adalah pengujian dapat dilakukan menggunakan atribut lainnnya pada dokumen publikasi.

show abstract

Section: Pendahuluanunclassified

Analisa Modifikasi Algoritma Stemming Untuk Kasus Overstemming

Hersianie

2020

teknokom

View full text Add to dashboard Cite

show abstract

“…The first steps are removing URLs, punctuation, and lower-casing. The second step is ignoring stopwords [8] from the dataset where it is based on accuracy evaluation after generating the list of stop words using the TF-IDF algorithm; Then, we applied the stemming algorithm [7,9] which is based on Uzbek words' endings' electronic dictionary that uses combinatorial approach inferring apply for part of speech of the Uzbek language: nouns, adjectives, numerals, verbs, participles, moods, voices. Advantages of using the algorithm are lexicon-free and its complexity that allows one operation (referring to the dictionary of endings of the language) to perform: segmentation of the word into suffixes; performs morphological analysis of the word.…”

Section: Introductionmentioning

confidence: 99%

Uzbek Sentiment Analysis based on local Restaurant Reviews

Matlatipov¹,

Rahimboeva²,

Rajabov³

et al. 2022

Preprint

View full text Add to dashboard Cite

Extracting useful information for sentiment analysis and classification problems from a big amount of user-generated feedback, such as restaurant reviews, is a crucial task of natural language processing, which is not only for customer satisfaction where it can give personalized services, but can also influence the further development of a company. In this paper, we present a work done on collecting restaurant reviews data as a sentiment analysis dataset for the Uzbek language, a member of the Turkic family which is heavily affected by the low-resource constraint, and provide some further analysis of the novel dataset by evaluation using different techniques, from logistic regression based models, to support vector machines, and even deep learning models, such as recurrent neural networks, as well as convolutional neural networks. The paper includes detailed information on how the data was collected, how it was pre-processed for better quality optimization, as well as experimental setups for the evaluation process. The overall evaluation results indicate that by performing pre-processing steps, such as stemming for agglutinative languages, the system yields better results, eventually achieving 91% accuracy result in the best performing model.

show abstract