2014
DOI: 10.1007/978-3-319-14120-6_43
|View full text |Cite
|
Sign up to set email alerts
|

Creating Multilingual Parallel Corpora in Indian Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 4 publications
0
10
0
Order By: Relevance
“…Along with harmonizing the chunks, this module marks the heads of each chunk in both languages using generalized rules defined by Sharma et al (2006). For clarity, we have mapped the POS tags from Penn Treebank POS tagsets (Marcus et al, 1993) for English and Bureau Of Indian Standard (BIS) POS tagset (Choudhary and Jha, 2011) for Bengali to the Universal Dependency Tagset . The second module in the pipeline facilitates rule-based chunk replacement by taking the chunkharmonized parallel Bengali and English sentences as inputs and replacing some selected Bengali chunks with English according to the rules discussed in 2.2.…”
Section: The Code-mixing Processmentioning
confidence: 99%
“…Along with harmonizing the chunks, this module marks the heads of each chunk in both languages using generalized rules defined by Sharma et al (2006). For clarity, we have mapped the POS tags from Penn Treebank POS tagsets (Marcus et al, 1993) for English and Bureau Of Indian Standard (BIS) POS tagset (Choudhary and Jha, 2011) for Bengali to the Universal Dependency Tagset . The second module in the pipeline facilitates rule-based chunk replacement by taking the chunkharmonized parallel Bengali and English sentences as inputs and replacing some selected Bengali chunks with English according to the rules discussed in 2.2.…”
Section: The Code-mixing Processmentioning
confidence: 99%
“…Along with harmonizing the chunks, this module marks the heads of each chunk in both languages using generalized rules defined by Sharma et al (2006). For clarity, we have mapped the POS tags from Penn Treebank POS tagsets (Marcus et al, 1993) for English and Bureau Of Indian Standard (BIS) POS tagset (Choudhary and Jha, 2011) for Bengali to the Universal Dependency Tagset (Nivre et al, 2016). The second module in the pipeline facilitates rule-based chunk replacement by taking the chunkharmonized parallel Bengali and English sentences as inputs and replacing some selected Bengali chunks with English according to the rules discussed in 2.2.…”
Section: The Code-mixing Processmentioning
confidence: 99%
“…Along with harmonizing the chunks, this module marks the heads of each chunk in both languages using generalized rules defined by Sharma et al (2006). For clarity, we have mapped the POS tags from Penn Treebank POS tagsets (Marcus et al, 1993) for English and Bureau Of Indian Standard (BIS) POS tagset (Choudhary and Jha, 2011) for Bengali to the Universal Dependency Tagset (Nivre et al, 2016).…”
Section: The Code-mixing Processmentioning
confidence: 99%
“…The parallel corpora for 4 Indian languages namely Hindi (hn), Marathi (mt), Gujarati (gj) and Bangla (bn) was taken from Indian Languages Corpora Initiative (ILCI) (Choudhary and Jha, 2011) . The parallel corpus used in our experiments belonged to two domains -health and tourism and the training set consisted of 28000 sentences.…”
Section: Baseline Translation Modelmentioning
confidence: 99%
“…In spite of several initiatives taken by numerous organizations to generate parallel corpora for different language pairs, training data for many language pairs is either not yet available or is insufficient for producing good SMT systems. Indian Languages Corpora Initiative (ILCI) (Choudhary and Jha, 2011) is currently the only reliable source for multilingual parallel corpora for Indian languages however the number of parallel sentences is still not sufficient to create high quality SMT systems.…”
Section: Introductionmentioning
confidence: 99%