Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstratio 2015
DOI: 10.3115/v1/n15-3017
|View full text |Cite
|
Sign up to set email alerts
|

Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent

Abstract: We present Brahmi-Net -an online system for transliteration and script conversion for all major Indian language pairs (306 pairs). The system covers 13 Indo-Aryan languages, 4 Dravidian languages and English. For training the transliteration systems, we mined parallel transliteration corpora from parallel translation corpora using an unsupervised method and trained statistical transliteration systems using the mined corpora. Languages which do not have parallel corpora are supported by transliteration through … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 40 publications
(19 citation statements)
references
References 3 publications
0
19
0
Order By: Relevance
“…We ran Stage 2 for 5 iterations. For a rule-based baseline, we used the script conversion method implemented in the Indic NLP Library 2 (Kunchukuttan et al, 2015) which is based on phonemic correspondences.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We ran Stage 2 for 5 iterations. For a rule-based baseline, we used the script conversion method implemented in the Indic NLP Library 2 (Kunchukuttan et al, 2015) which is based on phonemic correspondences.…”
Section: Methodsmentioning
confidence: 99%
“…Comparison with supervised system and some resource-constrained approaches: We compared our best substring-based model (based on sim1 prior) with a supervised system and the following resource-constrained transliteration systems built using: (i) Mined pairs from a translation corpus: We experimented with bn-hi on the Brahminet mined pairs corpus (Kunchukuttan et al, 2015). Mined corpora involving kn were not available.…”
Section: Illustrative Examplesmentioning
confidence: 99%
“…We convert Tamil, Bengali and Malayalam data to the Devanagari script using the Indic NLP li-2 Data is available here: http://www.cfilt.iitb. ac.in/ner/annotated_corpus/ brary 3 (Kunchukuttan et al, 2015) thereby, allowing sharing of sub-word features across the Indian languages. For Indian languages, the annotated data followed the IOB format.…”
Section: Datasetsmentioning
confidence: 99%
“…Brahmi-Net transliteration [11] considers this problem similar to a phrase based translation problem, through which sequences of characters from source to the target language are learnt, where the parallel corpus is trained using Moses. This system supports 13 Indo-Aryan languages, 4 Dravidian languages and English including 306 language pairs for statistical transliteration.…”
Section: Transliterationmentioning
confidence: 99%