The paper deals with upgrading of an electronic semantic dictionary of RUSLAN for automatic processing of Russian texts. The previous versions of the dictionary were created in the 1990-es and early 2000-es mainly for automatic processing of the Russian Federation’s state papers. Now the Authors inherit the basic formalism of the Dictionary, including the metalanguage and the structure of the dictionary entry. The current version is revised and enlarged in a number of ways. While the initial versions mostly predate the advent of corpus linguistics, the current version is based on corpus data. The Russian National Corpus was used as a source of sample sentences, as well as for determining statistically and empirically which linguistic information is pragmatically relevant. A structural representation for the sample sentences was designed, and a procedure for selecting lexical units from the corpus to use in a pragmatic description of polysemy. A formal representation of situations, previously outlined in the works of Nina N. Leontyeva, has also been detailed and largely realized. Among the lexicon, verbs in particular have received a more flexible description compared to the previous versions, and aspectual meanings are reflected with more nuance.
Работа посвящена дискурсивной разметке корпусов. В ней анализируется состав отношений, принятых в корпусе Ru -RSTreebank . Это корпус, размеченный в рамках теории риторических структур В.Манн и С.Томпсон. При разметке корпуса был принят ряд решений относительно модификаций исходного набора отношений. В статье рассматриваются проблемы, вызванные одним из противоречий, с которым сталкиваются разработчики при создании стандартов лингвистической разметки. Это противоречие между стремлением как можно более точно отразить лингвистическую реальность, с одной стороны, и требованием обеспечить устойчивость разметки, с другой. В статье на примере дискурсивной разметки анализируются проблемы, возникающие в случае упрощения разметки для обеспечения необходимой степени согласия аннотаторов .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.