We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE -100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.
The area of study of this paper,1 unlike Egypt to the west, and Syria and Mesopotamia to the north and east, has yet to produce a proper archive of cuneiform texts, although archaeologists and others have discovered around ninety cuneiform objects over the past century or so. Yet, due to the uneven pace of discovery and changing political and academic realities in the region over the years, no attempt has ever been made to study these cuneiform objects as a group, and the last published list of the relevant material was that of K. Galling in Textbuch zur Geschichte Israels in 1968.2 At present not only is there no comprehensive edition or bibliography of the cuneiform texts in our corpus, but there is not even an accurate list, leaving the materials largely inaccessible to most scholars. Our current project, "Cuneiform in The Land of Israel and Canaan," is intended to answer this need. The main goal of the project is the publication of a book that will include an introduction to the topic, editions of the inscriptions with philological notes, indexes, new handcopies, and photographs.3 We present here the first fruits of our endeavors: a bibliographical list of our corpus with a brief summary of our findings to date. INTRODUCTION TODAY WE ARE ABLE to place eighty-nine objects in our corpus. These range from well-known texts such as the Taanach letters, which have been studied and translated a number of times (Taanach 1-2, 5-6), to mere scraps of clay, and include texts belonging to a wide variety of genres, including literature, royal inscriptions, letters, administrative texts, inscribed cylinder seals, lexical texts, mathematical texts, omens, and a magical/medical text. Also participating in various stages of the project were DeLafayette Awkward, Yehudah Kaplan, Ralf Rothenbusch, Yoav Shor, and Peter Stein. The authors wish to thank numerous scholars and others who freely gave their time and support to the project. We cannot thank them ball by name here, but special thanks are due to Osnat Brandel of the Israel Museum, Omit Ilan at the Rockefeller Museum, and Gary Beckman of the University of Michigan for facilitating the study of tablets in museum collections. The project is funded in part by Israel Academy of Sciences, Humanities; and the Israel Science Foundation. Abbreviations are as in The Chicago Assyrian Dictionary (CAD). In addition, note: BAR = Biblical Archaeology Review; BN = Biblische Notizen; NEAEHL = The New Encyclopedia of Archaeological Excavations in The Holy Land; SAAB = State Archives of Assyria Bulletin. More than a third of the inscribed objects come from three sites: Taanach (17), Hazor (15), and Aphek (8). Samaria has yielded six objects, including late fourthcentury coins,4 while Megiddo has yielded five, but only one cuneiform tablet.5 No other site has provided more than four items. In fact, a majority of sites have contributed only an item or two. Sites yielding epigraphic finds range from Hazor in the north to Beer Sheva in the south, and from Ashkelon and Ashdod on the Mediterranean...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.