Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology 2021
DOI: 10.18653/v1/2021.sigmorphon-1.5
|View full text |Cite
|
Sign up to set email alerts
|

MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology

Abstract: Large-scale morphological databases provide essential input to a wide range of NLP applications. Inflectional data is of particular importance for morphologically rich (agglutinative and highly inflecting) languages, and derivations can be used, e.g. to infer the semantics of out-of-vocabulary words. Extending the scope of state-of-the-art multilingual morphological databases, we announce the release of Mor-phyNet, a high-quality resource with 15 languages, 519k derivational and 10.1M inflectional entries, and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

4
5

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 22 publications
0
13
0
Order By: Relevance
“…Our database of choice is the Universal Knowledge Core (UKC) 2 (Giunchiglia, Batsuren, & Bella, 2017), due to its wide linguistic, lexical, and conceptual coverage (120 thousand word meanings, 2 million words in 1,127 languages). The UKC has been used in several stud-ies in computational linguistics and lexical semantics, such as the computation of cognates (Batsuren, Bella, & Giunchiglia, 2019, 2021a and multilingual morphology (Batsuren, Bella, & Giunchiglia, 2021b).…”
Section: Methodsmentioning
confidence: 99%
“…Our database of choice is the Universal Knowledge Core (UKC) 2 (Giunchiglia, Batsuren, & Bella, 2017), due to its wide linguistic, lexical, and conceptual coverage (120 thousand word meanings, 2 million words in 1,127 languages). The UKC has been used in several stud-ies in computational linguistics and lexical semantics, such as the computation of cognates (Batsuren, Bella, & Giunchiglia, 2019, 2021a and multilingual morphology (Batsuren, Bella, & Giunchiglia, 2021b).…”
Section: Methodsmentioning
confidence: 99%
“…no overlapping tokens), resulting in 12,028 entries. MorphyNet (Batsuren et al, 2021) provides derivational and inflectional morphology for words across 15 languages, expanding on the UniMorph dataset (McCarthy et al, 2020). Taking only those derivational morphology entries in English with a concatenative parse gives 193,945 entries.…”
Section: Intrinsic Evaluation: Morphological Correctnessmentioning
confidence: 99%
“…Language-specific editions of Wiktionary contain large amounts of derivational data, typically in two forms: etymology templates and derived terms (see Figure 2). Building on prior results from the MorphyNet project (Batsuren et al, 2021), we have implemented an extraction mechanism from both kinds of sections, covering 12 Wiktionary editions and 30 languages. We managed to extract 4.3 million preliminary derivations, as reported in Table 5.…”
Section: Derivational Paradigmsmentioning
confidence: 99%