Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages 2022
DOI: 10.18653/v1/2022.dravidianlangtech-1.41
|View full text |Cite
|
Sign up to set email alerts
|

Overview of the Shared Task on Machine Translation in Dravidian Languages

Abstract: This paper presents an outline of the shared task on translation of under-resourced Dravidian languages at DravidianLangTech-2022 workshop to be held jointly with ACL 2022. A description of the datasets used, approach taken for analysis of submissions and the results have been illustrated in this paper. Five sub-tasks organized as a part of the shared task include the following translation pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Sanskrit, Kannada to Malayalam and Kannada to Tulu. Training, devel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…Social media platforms, such as Instagram, Twitter, Facebook, and Pinterest, contain a lot of trolling and not-trolling memes with textual information written in code-mixed Kannada and Tulu text. To construct KAmemes and TUmemes -trolling memes detection datasets with code-mixed Kannada and Tulu text, Kannada and Tulu memes are collected between December 2021 to October 2022 and January 2023 to September 2023 respectively, from Instagram 6 and Pinterest 7 . The text embedded on these memes are extracted using Google lens 8 and is manually verified and corrected.…”
Section: Data Collection and Annotationmentioning
confidence: 99%
See 1 more Smart Citation
“…Social media platforms, such as Instagram, Twitter, Facebook, and Pinterest, contain a lot of trolling and not-trolling memes with textual information written in code-mixed Kannada and Tulu text. To construct KAmemes and TUmemes -trolling memes detection datasets with code-mixed Kannada and Tulu text, Kannada and Tulu memes are collected between December 2021 to October 2022 and January 2023 to September 2023 respectively, from Instagram 6 and Pinterest 7 . The text embedded on these memes are extracted using Google lens 8 and is manually verified and corrected.…”
Section: Data Collection and Annotationmentioning
confidence: 99%
“…Being one of the constitutional languages of India, Kannada is an official and administrative language of Karnataka with more than 40 million native speakers [7]. Kannada belongs to the Dravidian language family and many articles with hundreds of years of history are written in Kannada.…”
Section: Introductionmentioning
confidence: 99%
“…The bilingual dataset provided by the organizers (Madasamy et al, 2022) was divided into three sub-corporas of train, dev and test. The statistics of the training data is given in Table 1.…”
Section: Dataset Descriptionmentioning
confidence: 99%
“…al. [19] has released a shared task 4 with the primary objective of detecting homophobic and transphobic texts in social media comments in Tamil, English, and Tamil-English and also reported on the results of this shared task. For this shared task, numerous pre-trained models and transformer models, such as BERT, mBERT, XLM-RoBERTa, IndicBERT, HateBERT, etc., have been utilized.…”
Section: Shared Tasksmentioning
confidence: 99%