Abstract:This work presents the first fully-fledged discourse parser for Russian based on the Rhetorical Structure Theory of Mann and Thompson (1988). For the segmentation, discourse tree construction, and discourse relation classification we employ deep learning models. With the help of multiple word embedding techniques, the new state of the art for discourse segmentation of Russian texts is achieved. We found that the neural classifiers using contextual word representations outperform previously proposed feature-bas… Show more
“…Named entities are recognized with the SpaCy⁵ ru_core_news_lg model predicting BIO-tags from token embeddings. Discourse structures are produced with the IsaNLP RST⁶ parser for Russian (Chistova et al, 2021). The parser generates trees for each paragraph; we merged these trees with a right-branching multinuclear JOINT relation to construct the full-text RST trees.…”
Section: Instruments For Linguistic Analysismentioning
Coreference resolution is the task of identifying and grouping mentions referring to the same real-world entity. Previous neural models have mainly focused on learning span representations and pairwise scores for coreference decisions. However, current methods do not explicitly capture the referential choice in the hierarchical discourse, an important factor in coreference resolution. In this study, we propose a new approach that incorporates rhetorical information into neural coreference resolution models. We collect rhetorical featuresfrom automated discourse parses and examine their impact. As a base model, we implement an end-to-end span-based coreference resolver using a partially fine-tuned multilingual entity-aware language model LUKE. We evaluate our method on the RuCoCo-23 Shared Task for coreference resolution in Russian. Our best model employing rhetorical distance between mentions has ranked 1st on the development set (74.6% F1) and 2nd on the test set (73.3% F1) of the Shared Task¹. We hope that our work will inspire further research on incorporating discourse information in neural coreference resolution models.
“…Named entities are recognized with the SpaCy⁵ ru_core_news_lg model predicting BIO-tags from token embeddings. Discourse structures are produced with the IsaNLP RST⁶ parser for Russian (Chistova et al, 2021). The parser generates trees for each paragraph; we merged these trees with a right-branching multinuclear JOINT relation to construct the full-text RST trees.…”
Section: Instruments For Linguistic Analysismentioning
Coreference resolution is the task of identifying and grouping mentions referring to the same real-world entity. Previous neural models have mainly focused on learning span representations and pairwise scores for coreference decisions. However, current methods do not explicitly capture the referential choice in the hierarchical discourse, an important factor in coreference resolution. In this study, we propose a new approach that incorporates rhetorical information into neural coreference resolution models. We collect rhetorical featuresfrom automated discourse parses and examine their impact. As a base model, we implement an end-to-end span-based coreference resolver using a partially fine-tuned multilingual entity-aware language model LUKE. We evaluate our method on the RuCoCo-23 Shared Task for coreference resolution in Russian. Our best model employing rhetorical distance between mentions has ranked 1st on the development set (74.6% F1) and 2nd on the test set (73.3% F1) of the Shared Task¹. We hope that our work will inspire further research on incorporating discourse information in neural coreference resolution models.
“…In this study, we employ the recent end-to-end RST parsers for English 3 (Zhang et al, 2021) and Russian (Chistova et al, 2020). 3 The models trained on RST-DT corpus.…”
Section: Analyzing Paraphrases From a Discourse Perspectivementioning
confidence: 99%
“…• RST parser for Russian (Chistova et al, 2020), RST-Tace (Wan et al, 2019), rstWeb (Zeldes, 2016), Multilingual DeBERTa v3 (He et al, 2021), spaCy (Honnibal et al, 2020), Evidence Graph framework (Peldszus and Stede, 2015b): MIT License.…”
We show that using the rhetorical structure automatically generated by the discourse parser is beneficial for paragraph-level argument mining in Russian. First, we improve the structure awareness of the current RST discourse parser for Russian by employing the recent top-down approach for unlabeled tree construction on a paragraph level. Then we demonstrate the utility of this parser in two classification argument mining subtasks of the RuARG-2022 shared task. Our approach leverages a structured LSTM module to compute a text representation that reflects the composition of discourse units in the rhetorical structure. We show that: (i) the inclusion of discourse analysis improves paragraph-level text classification; (ii) a novel TreeLSTM-based approach performs well for the computation of the complex text hidden representation using both a language model and an end-to-end RST parser; (iii) structures predicted by the proposed RST parser reflect the argumentative structures in texts in Russian.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.