2020
DOI: 10.28995/2075-7182-2020-19-553-569
|View full text |Cite
|
Sign up to set email alerts
|

Grameval 2020 Shared Task: Russian Full Morphology and Universal Dependencies Parsing

Abstract: The paper presents the results of GramEval 2020, a shared task on Russian morphological and syntactic processing. The objective is to process Russian texts starting from provided tokens to parts of speech (pos), grammatical features, lemmas, and labeled dependency trees. To encourage the multi-domain processing, five genres of Modern Russian are selected as test data: news, social media and electronic communication, wiki-texts, fiction, poetry; Middle Russian texts are used as the sixth test set. The data anno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
3

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 3 publications
0
5
0
3
Order By: Relevance
“…We compare LIMA performance on Russian-SynTagRus corpus using the official CoNLL 2018 evaluation script and on GramEval-2020 [14] corpus using its official evaluation script. The use of the evaluation script from CoNLL 2018 competition is motivated by the intention to compare our results with previous works.…”
Section: Discussionmentioning
confidence: 99%
“…We compare LIMA performance on Russian-SynTagRus corpus using the official CoNLL 2018 evaluation script and on GramEval-2020 [14] corpus using its official evaluation script. The use of the evaluation script from CoNLL 2018 competition is motivated by the intention to compare our results with previous works.…”
Section: Discussionmentioning
confidence: 99%
“…At the MorphoRuEval-2017 shared task [11], a 96.91% accuracy score in lemmatization was achieved on a balanced set of data from various sources (news, social networks, fiction, etc.). And in the GramEval-2020 shared task [12] the track became even more complicated since data from social media, poetry and historical texts of the 17th century were added to the test sample: the best overall lemmatization score being 98% on fiction texts, 98.2% on the news, 95.3% on poetry, 96% on social media, 93% on wiki and 78.3% on historical texts. It became manifest that it is technically possible for the Russian language to pose more complex challenges, especially for notoriously "difficult-to-process" groups of words and lexical categories.…”
Section: Previous Workmentioning
confidence: 99%
“…The model we used in our approach closely follows the model from [1] which has shown the stateof theart performance in morphosyntactic parsing on the GramEval2020 dataset [6].…”
Section: Modelmentioning
confidence: 99%
“…https://natasha.github.io/ 4 "The Government of the Russian Federation will consider…" 5 "Russian"6 "The Russian Federation"…”
mentioning
confidence: 99%