2022
DOI: 10.3390/math10183236
|View full text |Cite
|
Sign up to set email alerts
|

Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language

Abstract: In this research, a method of developing a machine model for sentiment processing in the Serbian language is presented. The Serbian language, unlike English and other popular languages, belongs to the group of languages with limited resources. Three different data sets were used as a data source: a balanced set of music album reviews, a balanced set of movie reviews, and a balanced set of music album reviews in English—MARD—which was translated into Serbian. The evaluation included applying developed models wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…Initially, the original archetype of the NAMT model is drilled and validated in high-resource languages, with BLEU scores evaluated on official test sets, encompassing Newstest 2014 for WMT English-German, Newstest 2016 for WMT English-Romanian, or the development set for IWSLT16 English-German. When it comes to low-resource languages, however, training data-hungry NAT models is a non-trivial challenge, confronted with limited language processing tools and an inadequate parallel corpus of target minor languages, let alone potential accuracy degradation, which can be amplified due to the increased morphological complexity, such as Serbian [118,119], and the ulterior linguistic connections compared to resourceprosperous languages such as English and Romanian. To contend with this dilemma, extra data augmentation techniques such as back-translations offer viable means to train the NAT models, considering the inherently dual properties of machine translation tasks [120,121].…”
Section: Discussionmentioning
confidence: 99%
“…Initially, the original archetype of the NAMT model is drilled and validated in high-resource languages, with BLEU scores evaluated on official test sets, encompassing Newstest 2014 for WMT English-German, Newstest 2016 for WMT English-Romanian, or the development set for IWSLT16 English-German. When it comes to low-resource languages, however, training data-hungry NAT models is a non-trivial challenge, confronted with limited language processing tools and an inadequate parallel corpus of target minor languages, let alone potential accuracy degradation, which can be amplified due to the increased morphological complexity, such as Serbian [118,119], and the ulterior linguistic connections compared to resourceprosperous languages such as English and Romanian. To contend with this dilemma, extra data augmentation techniques such as back-translations offer viable means to train the NAT models, considering the inherently dual properties of machine translation tasks [120,121].…”
Section: Discussionmentioning
confidence: 99%
“…Generation: the new hidden state 𝑠 𝑡 is produced in the 𝑐𝐺𝑅𝑈 at time step 𝑡. The mathematical formulas are described in (8) to (11).…”
Section: Decodermentioning
confidence: 99%
“…For instance, researchers have dedicated their efforts to constructing the Serbian wordnet [48], a valuable resource that organizes Serbian lexical units and their semantic relations. Sentiment analysis, another area of interest, has been applied to analyze the sentiment expressed in Serbian newspaper content [49] as well as movie reviews [50], and the classification of documents based on n−grams [51], and [52] music album reviews.…”
Section: Sentiment Analysis In the Bosnian/croatian/serbian/slovenian...mentioning
confidence: 99%