2017
DOI: 10.5120/ijca2017916036
|View full text |Cite
|
Sign up to set email alerts
|

Building English-Punjabi Parallel corpus for Machine Translation

Abstract: ObjectiveParallel corpus is the key resource for English Punjabi machine translation. At wide level there is no availability of EnglishPunjabi corpora. There is a primary requirement of parallel corpus for the training of statistical machine translation. Methods/AnalysisIn this paper, authors focus on building English-Punjabi corpus at large scale. It posed difficulties and the intensive labor to develop the corpus. We are intricate on the collection as well as the flow of work for the construction of parallel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 5 publications
(6 reference statements)
0
2
0
Order By: Relevance
“…Data can also be fetched from various websites where bilingual scripts are available. Jindal et al [15] gathered the data from Gyan Nidhi, EMILLE, Bible, Guru Granth Sahib, PSEB e-books, web content on health and tourism. The raw data were cleaned and converted into the desired set of languages i.e., English and Punjabi.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Data can also be fetched from various websites where bilingual scripts are available. Jindal et al [15] gathered the data from Gyan Nidhi, EMILLE, Bible, Guru Granth Sahib, PSEB e-books, web content on health and tourism. The raw data were cleaned and converted into the desired set of languages i.e., English and Punjabi.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Table 4 provides some of the seed dictionaries used by researchers with various language pairs. The seed dictionary can be created manually (Utiyama and Isahara, 2003;Fung and Cheung, 2004;Adafre and Rijke, 2006;Lu et al, 2010;Jindal et al, 2018a;Deep et al, 2018) or a seed parallel corpus (Zhao and Vogel, 2002;Kumar and Goyal, 2010;Munteanu and Marcu, 2006;Ling et al, 2013;Smith et al, 2010;Tillmann, 2009;Lakshmi and Shambhavi, 2020;Gahbiche-Braham et al, 2011;Stefanescu and Ion, 2013;Stefanescu et al, 2012;Abdul and Schwenk, 2011) available can be utilized. Lu et al (2010) provided a broad parallel corpus derived from an Internet-sourced corpus of comparable English-Chinese patents.…”
Section:  Parallel Seed Dictionarymentioning
confidence: 99%
“…The terms discovered during the study were applied to the machine translation dictionary that already existed. Jindal et al (2018a) focused on creating an English-Punjabi corpus of big size. The use of a parallel corpus is important for statistical machine translation training.…”
Section:  Parallel Seed Dictionarymentioning
confidence: 99%