Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 2018
DOI: 10.4000/books.aaccademia.3121
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing the Latin Morphological Analyser lemlat with a Medieval Latin Glossary

Abstract: We present the process of expanding the lexical basis of the Latin morphological analyser lemlat with the entries from the Medieval Latin glossary Du Cange. This process is performed semi-automatically by exploiting the morphological properties of lemmas, a previously available word list enhanced with inflectional information, and the contents of the lexical entries of Du Cange.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…Lemmas are canonical forms of words that are used by dictionaries to cite lexical entries, and are produced by lemmatisers to analyse tokens in corpora. For this reason, as was said, the core of the LiLa Knowledge Base is represented by the collection of Latin lemmas taken from the morphological analyser Lemlat; 9 Lemlat has proven to cover more than 98% of the textual occurrences of the word forms recorded in the comprehensive Thesaurus formarum totius latinitatis (TFTL, Tombeur, 1998), which is based on a corpus of texts ranging from the beginnings of Latin literature up to present times, for a total of more than 60 million words (Cecchini et al, 2018). LiLa thus aims to achieve interoperability by linking all entries in lexical re- sources and corpus tokens that refer to the same lemma, allowing a good balance between feasibility and granularity.…”
Section: The Lila Knowledge Basementioning
confidence: 99%
See 1 more Smart Citation
“…Lemmas are canonical forms of words that are used by dictionaries to cite lexical entries, and are produced by lemmatisers to analyse tokens in corpora. For this reason, as was said, the core of the LiLa Knowledge Base is represented by the collection of Latin lemmas taken from the morphological analyser Lemlat; 9 Lemlat has proven to cover more than 98% of the textual occurrences of the word forms recorded in the comprehensive Thesaurus formarum totius latinitatis (TFTL, Tombeur, 1998), which is based on a corpus of texts ranging from the beginnings of Latin literature up to present times, for a total of more than 60 million words (Cecchini et al, 2018). LiLa thus aims to achieve interoperability by linking all entries in lexical re- sources and corpus tokens that refer to the same lemma, allowing a good balance between feasibility and granularity.…”
Section: The Lila Knowledge Basementioning
confidence: 99%
“…In its design, the structure of LiLa is highly lexically-based: the core component of the Knowledge Base is an extensive list of Latin lemmas extracted from the morphological analyser for Latin Lemlat (Passarotti et al, 2017). This list has been compiled into a database from three reference dictionaries for Classical Latin ( (Georges, 1913); (Glare, 1982); (Gradenwitz, 1904)), the entire Onomasticon from Forcellini's (Forcellini, 1867) Lexicon Totius Latinitatis (Budassi and Passarotti, 2016) and the Medieval Latin Glossarium Mediae et Infimae Latinitatis by du Cange et al (1883Cange et al ( -1887, for a total of over 150,000 lemmas (Cecchini et al, 2018). The portion of the lexical basis of Lemlat concerning Classical and Late Latin (43,432 lemmas) was also enhanced with information taken from the Word Formation Latin (WFL) lexicon , a lexical resource that provides information about derivational morphology by connecting lemmas via word formation rules.…”
Section: Introductionmentioning
confidence: 99%
“…The lexical basis of the Latin morphological analyzer Lemlat [27] was used to populate the LiLa collection. Lemlat's database reconciles three reference dictionaries for Classical Latin [28] [29] [30], the entire Onomasticon from Forcellini's Lexicon Totius Latinitatis [31] and the Medieval Latin Glossarium Mediae et Infimae Latinitatis by du Cange et alii [32], for a total of over 150,000 lemmas.…”
Section: Lexical Entries Tokensmentioning
confidence: 99%
“…Lemlat relies on a lexical basis resulting from the collation of three Latin dictionaries (Georges andGeorges, 1913-1918;Glare, 1982;Gradenwitz, 1904) for a total of 40,014 lexical entries and 43,432 lemmas, as more than one lemma can be included in one lexical entry. This lexical basis was recently enlarged by adding most of the Onomasticon (26,415 lemmas out of 28,178) provided by the 5th edition of the Forcellini dictionary (Budassi and Passarotti, 2016) and the entries from a large reference glossary for Medieval Latin, namely the Glossarium Mediae et Infimae Latinitatis (du Cange et al, 1883(du Cange et al, -1887Cecchini et al, 2018).…”
Section: The Lemma Collectionmentioning
confidence: 99%