Overview of the Shared Task on Machine Translation in Dravidian Languages

M, Anand Kumar; Hegde, Anupama; Banerjee, Shubhanker; Chakravarthi, Bharathi Raja; Priyadharshini, Ruba; Shashirekha, Hosahalli; McCrae, John P.

doi:10.18653/v1/2022.dravidianlangtech-1.41

Cited by 7 publications

(4 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Social media platforms, such as Instagram, Twitter, Facebook, and Pinterest, contain a lot of trolling and not-trolling memes with textual information written in code-mixed Kannada and Tulu text. To construct KAmemes and TUmemes -trolling memes detection datasets with code-mixed Kannada and Tulu text, Kannada and Tulu memes are collected between December 2021 to October 2022 and January 2023 to September 2023 respectively, from Instagram 6 and Pinterest 7 . The text embedded on these memes are extracted using Google lens 8 and is manually verified and corrected.…”

Section: Data Collection and Annotationmentioning

confidence: 99%

See 1 more Smart Citation

Identification of Trolling Memes in Kannada and Tulu - Under-resourced Dravidian Languages

Hegde,

Lakshmaiah

2024

Preprint

View full text Add to dashboard Cite

On social media platforms, information, ideas, or other forms of expressions are created and/or shared among people in an interactive manner. During exchange of information, users may encounter humorous, funny, offensive, trolling, or malicious content targeting the individuals, groups, or communities. One common way of trolling on social media is to create memes by combining an image with textual information - usually a catchy phrase often obscured by humor or sarcasm, and share it on social media. Memes shared with the intention of trolling need to be filtered out from social media as they may hurt the sentiments of people and create an unhealthy atmosphere in the society. The increasing number of social media users and the increasing number of trolls on social media complicates the task of identifying the trolling memes manually. Hence, there is a demand for the tools to automatically identify the trolling memes. However, this task is challenging due to the unavailability of annotated data. The complexity of the task gets intensified if the text is written in code-mixed under-resourced regional languages like Kannada or Tulu - the languages of south India. To tackle the unavailability of annotated data and tools to identify trolling memes in under-resourced languages - Kannada and Tulu, we created two datasets: i)~\textit{KAmemes} - a meme dataset embedded code-mixed text in Kannada and ii)~\textit{TUmemes} - a meme dataset embedded code-mixed text in Tulu, consisting of memes labeled as \lq Troll' and \lq Not\_Troll'. To benchmark these datasets, Uni-modal and Multi-modal models are proposed to classify a given meme as \lq Troll' or \lq Not\_Troll'. While the uni-modal approaches consider only text or only image to classify a given meme, multi-modal approaches explore both text and image modalities. Several ML and DL baselines are implemented for uni-modal and multi-modal models. The proposed baselines are also evaluated on the available \textit{TamilMemes} dataset to illustrate their efficacy. Among the proposed baselines, a multi-modal joint representation based dual encoder model achieved the best macro F1 scores of 0.90, 0.78, and 0.58 for \textit{TUmemes}, \textit{KAmemes}, and \textit{TamilMemes} datasets respectively.

show abstract

Section: Data Collection and Annotationmentioning

confidence: 99%

“…Being one of the constitutional languages of India, Kannada is an official and administrative language of Karnataka with more than 40 million native speakers [7]. Kannada belongs to the Dravidian language family and many articles with hundreds of years of history are written in Kannada.…”

Section: Introductionmentioning

confidence: 99%

Identification of Trolling Memes in Kannada and Tulu - Under-resourced Dravidian Languages

Hegde,

Lakshmaiah

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…The bilingual dataset provided by the organizers (Madasamy et al, 2022) was divided into three sub-corporas of train, dev and test. The statistics of the training data is given in Table 1.…”

Section: Dataset Descriptionmentioning

confidence: 99%

PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Aditya¹,

Rahul²,

Mandke³

et al. 2022

Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

View full text Add to dashboard Cite

This paper presents a summary of the findings that we obtained based on the shared task on machine translation of Dravidian languages. We stood first in three of the five sub-tasks which were assigned to us for the main shared task. We carried out neural machine translation for the following five language pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Malayalam, Kannada to Sanskrit, and Kannada to Tulu. The datasets for each of the five language pairs were used to train various translation models, including Seq2Seq models such as LSTM, bidirectional LSTM, Conv2Seq, and training state-of-the-art as transformers from scratch, and fine-tuning already pre-trained models. For some models involving monolingual corpora, we implemented backtranslation as well. These models' accuracy was later tested with a part of the same dataset using BLEU score as an evaluation metric. * equal contribution † equal contribution ‡ equal contribution

show abstract

“…al. [19] has released a shared task 4 with the primary objective of detecting homophobic and transphobic texts in social media comments in Tamil, English, and Tamil-English and also reported on the results of this shared task. For this shared task, numerous pre-trained models and transformer models, such as BERT, mBERT, XLM-RoBERTa, IndicBERT, HateBERT, etc., have been utilized.…”

Section: Shared Tasksmentioning

confidence: 99%

On Finetuning Adapter-Based Transformer Models for Classifying Abusive Social Media Tamil Comments

et al. 2022

View full text Add to dashboard Cite

Overview of the Shared Task on Machine Translation in Dravidian Languages

Cited by 7 publications

References 17 publications

Identification of Trolling Memes in Kannada and Tulu - Under-resourced Dravidian Languages

Identification of Trolling Memes in Kannada and Tulu - Under-resourced Dravidian Languages

PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

On Finetuning Adapter-Based Transformer Models for Classifying Abusive Social Media Tamil Comments

Contact Info

Product

Resources

About