Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.308
|View full text |Cite
|
Sign up to set email alerts
|

Towards the First Machine Translation System for Sumerian Transliterations

Abstract: The Sumerian cuneiform script was invented more than 5,000 years ago and represents one of the oldest in history. We present the first attempt to translate Sumerian texts into English automatically. We publicly release high-quality corpora for standardized training and evaluation and report results on experiments with supervised, phrase-based, and transfer learning techniques for machine translation. Quantitative and qualitative evaluations indicate the usefulness of the translations. Our proposed methodology … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…The first NMT results for texts written in Sumerian from the Ur III period (see Materials and methods ) are based on c. 10,000 transliteration to English translation sentences. They report the best BLEU scores with 2 different NMT models of 20.9 and 21.6 and the worst score with a statistical based model of 8.2 ( 15 ). Their best model was an attention LSTM model pretrained on English word embeddings from Wikipedia.…”
Section: Related Workmentioning
confidence: 99%
“…The first NMT results for texts written in Sumerian from the Ur III period (see Materials and methods ) are based on c. 10,000 transliteration to English translation sentences. They report the best BLEU scores with 2 different NMT models of 20.9 and 21.6 and the worst score with a statistical based model of 8.2 ( 15 ). Their best model was an attention LSTM model pretrained on English word embeddings from Wikipedia.…”
Section: Related Workmentioning
confidence: 99%
“…This body of work has focused on optical character recognition and visual analysis [31][32][33][34] , writer identification [35][36][37] and text analysis [38][39][40][41][42][43][44] , stylometrics 45 and document dating 46 . It is only very recently that scholarship has begun to use deep learning and neural networks for optical character recognition [47][48][49][50][51][52][53][54][55] , text analysis 56 , machine translation of ancient texts [57][58][59] , authorship attribution 60,61 and deciphering ancient languages 62,63 , and been applied to study the form and style of epigraphic monuments 64 .…”
Section: Previous Workmentioning
confidence: 99%
“…Past work aimed at machine translation of Sumerian-English (Pagé-Perron et al, 2017;Punia et al, 2020a) have used the minimal bitext upon a variety of general statistical and neural supervised techniques. However, they do not handle the textlevel peculiarities any differently than one would do for a high-resource language, thus, often failing to capture context, resulting in poor and inconsistent translations.…”
Section: Related Workmentioning
confidence: 99%
“…We perform experiments on a variety of data configurations which are given by: 1. UrIIISeg: Follows the format as present in the original texts provided by Assyriologists and used in the past attempts for Sumerian-English machine translation (Pagé-Perron et al, 2017;Punia et al, 2020b). It contains only in-domain Ur III Admin text with line-by-line translated segments, each of 1-5 words.…”
Section: Supervised Nmtmentioning
confidence: 99%