Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2015
DOI: 10.3115/v1/n15-1167
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition

Abstract: Lemmatization for the Sumerian language, compared to the modern languages, is much more challenging due to that it is a long dead language, highly skilled language experts are extremely scarce and more and more Sumerian texts are coming out. This paper describes how our unsupervised Sumerian named-entity recognition (NER) system helps to improve the lemmatization of the Cuneiform Digital Library Initiative (CDLI), a specialist database of cuneiform texts, from the Ur III period. Experiments show that a promisi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…However, the field does have a tradition with dictionarybased glossing of transliterated text. Similar to technologies commonly used in language documentation and linguistic typology (Robinson et al, 2007), the ORACC Lemmatizer (Robson, 2018;Liu et al, 2015) can provide word-by-word glosses along with a morphological analysis, albeit without contextual disambiguation, and without producing coherent text.…”
Section: Related Workmentioning
confidence: 99%
“…However, the field does have a tradition with dictionarybased glossing of transliterated text. Similar to technologies commonly used in language documentation and linguistic typology (Robinson et al, 2007), the ORACC Lemmatizer (Robson, 2018;Liu et al, 2015) can provide word-by-word glosses along with a morphological analysis, albeit without contextual disambiguation, and without producing coherent text.…”
Section: Related Workmentioning
confidence: 99%
“…-Many works go further into some specific domains, such as Sci-Tech compound entity recognition (Yan et al 2016), biomedical named entity recognition (Song et al 2018), entity extraction from clinical records (Alicante et al 2016;Boytcheva 2018;Henriksson et al 2015), chemical named entity recognition (Swain and Cole 2016; Zhang et al 2016), educational term extraction (Conde et al 2016), cybersecurity concepts extraction (Xiao 2017), Drug Name Recognition (Liu et al 2015b), medical named entity recognition (He and Kayaalp 2008;Kavuluru et al 2013;Skeppstedt 2014), and so forth. -In addition to English, unsupervised Named Entity Recognition has been studied in other languages, including Chinese (Jia et al 2018), Spanish (Copara et al 2016), French (Mosallam et al 2014), Italian (Alicante et al 2016), and Russian (Ivanitskiy et al 2016), as well as crosslingual (Abdel Hady et al 2014), even including a dead language like Sumerian (Liu et al 2015a).…”
Section: Named Entity Recognitionmentioning
confidence: 99%
“…Our morphological analyzer will be partly based on existing tools such as Tablan et al ( 2006)'s rule-based morphology and Liu et al (2015)'s algorithm to identify named entities. We will design a custom parser for numero-metrological content for the occasion.…”
Section: Morphological Analysismentioning
confidence: 99%