2020
DOI: 10.48550/arxiv.2008.01391
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Survey of Orthographic Information in Machine Translation

Bharathi Raja Chakravarthi,
Priya Rani,
Mihael Arcan
et al.

Abstract: Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable, but o… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 86 publications
(103 reference statements)
0
3
0
Order By: Relevance
“…Normalization is a very common practice embraced in the NLP research when dealing with historical or otherwise non-standard language [7,3,29,9,43]. The benefit of normalization is that it makes non-standard orthography standard and thus enables the use of NLP tools and resources designed for modern normative data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Normalization is a very common practice embraced in the NLP research when dealing with historical or otherwise non-standard language [7,3,29,9,43]. The benefit of normalization is that it makes non-standard orthography standard and thus enables the use of NLP tools and resources designed for modern normative data.…”
Section: Related Workmentioning
confidence: 99%
“…Lady Harley used the potentially Civil War related term incendiaries in a political discussion with her son, who later became a Parliamentarian army officer; both her and Elizabeth's use of new vocabulary focuses on the semantic class 'society » authority'. Lady Conway, on the other hand, discussed philosophical concepts with her friend and fellow philosopher, Henry More, as in (3). The OED records More's use of idolum in a publication in 1647, so both of the correspondents would have been familiar with the term.…”
Section: Sociolinguistic Variationmentioning
confidence: 99%
“…Multilingual users have the tendency to mix linguistic units in the social media resulting in code-mixed data being easily available. The phenomenon of code-mixing is explained in [5,6,7,8,9,10] and provides an analysis on the possible reasons behind code-mixing. This is done by identifying the languages involved in the code-mixed data which looks inevitable.…”
Section: Related Workmentioning
confidence: 99%