2018
DOI: 10.14419/ijet.v7i2.10762
|View full text |Cite
|
Sign up to set email alerts
|

Development of Punjabi-English (PunEng) Parallel Corpus for Machine Translation System

Abstract: This paper describes the creation process and statistics of Punjabi English (PunEng) parallel corpus. Parallel corpus is the main requirement to develop statistical machine translation as well as neural machine translation. Until now, we do not have any availability of PunEng parallel corpus. In this paper, we have shown difficulties and intensive labor to develop parallel corpus. Methods used for collecting data and the results are discussed, errors during the process of collecting data and how to handle thes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 1 publication
0
3
0
Order By: Relevance
“…Table 4 provides some of the seed dictionaries used by researchers with various language pairs. The seed dictionary can be created manually (Utiyama and Isahara, 2003;Fung and Cheung, 2004;Adafre and Rijke, 2006;Lu et al, 2010;Jindal et al, 2018a;Deep et al, 2018) or a seed parallel corpus (Zhao and Vogel, 2002;Kumar and Goyal, 2010;Munteanu and Marcu, 2006;Ling et al, 2013;Smith et al, 2010;Tillmann, 2009;Lakshmi and Shambhavi, 2020;Gahbiche-Braham et al, 2011;Stefanescu and Ion, 2013;Stefanescu et al, 2012;Abdul and Schwenk, 2011) available can be utilized. Lu et al (2010) provided a broad parallel corpus derived from an Internet-sourced corpus of comparable English-Chinese patents.…”
Section:  Parallel Seed Dictionarymentioning
confidence: 99%
See 1 more Smart Citation
“…Table 4 provides some of the seed dictionaries used by researchers with various language pairs. The seed dictionary can be created manually (Utiyama and Isahara, 2003;Fung and Cheung, 2004;Adafre and Rijke, 2006;Lu et al, 2010;Jindal et al, 2018a;Deep et al, 2018) or a seed parallel corpus (Zhao and Vogel, 2002;Kumar and Goyal, 2010;Munteanu and Marcu, 2006;Ling et al, 2013;Smith et al, 2010;Tillmann, 2009;Lakshmi and Shambhavi, 2020;Gahbiche-Braham et al, 2011;Stefanescu and Ion, 2013;Stefanescu et al, 2012;Abdul and Schwenk, 2011) available can be utilized. Lu et al (2010) provided a broad parallel corpus derived from an Internet-sourced corpus of comparable English-Chinese patents.…”
Section:  Parallel Seed Dictionarymentioning
confidence: 99%
“…The BLEU and NIST scores are used to assess quality. Deep et al (2018) provided in the research different sources to collect the English data and Punjabi data. Their work presents the Punjabi -English parallel corpus and named it Pun Eng.…”
Section: Lexaccmentioning
confidence: 99%
“…Statistical machine Translations were widely used in the natural language Processing [1]. Development of the parallel corpus was the main method used in statistical machine translation [2]. Machine Learning algorithms can be best evaluated while using it to parse Garden Path sentences or the sentences having complex structures [3].In the area of Sentimental Analysis, there are basically two basic approaches in sentimental analysis namely lexicon based and machine learning.…”
Section: Related Workmentioning
confidence: 99%