V. Selegey scite author profile

V. Selegey

3Publications

4Citation Statements Received

0Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Differential Semantic Sketches for Russian Internet-Corpora

Detkova¹,

Mipt²,

Novitskiy³

et al. 2020

View full text Add to dashboard Cite

The current paper suggests a new representation type of word collocations—the semantic sketches. It was first tested on one of the subcorpora of the General Internet-Corpus of Russian. The semantic sketches continue the idea of word sketches based on grammatical relations between words and expand it by adding the semantic information—word meanings and semantic relations between words. Moreover, the sketches can be additionally provided with metatextual characteristics. Certainly, building such sketches demands the semantic markup of the corpora. Therefore, we have used partial semantic analysis of the Compreno parser for our purposes. The paper demonstrates the examples of the sketches, provides the quality evaluation of the markup they are based on, and shows the advantages and disadvantages of the given approach.

show abstract

Corpus regional lexicography: principles, methods, and preliminary results

Belikov¹,

Mipt²,

Dubyaga³

et al. 2021

View full text Add to dashboard Cite

The article summarizes the results of the long-term project "Languages of Russian Cities" (LoRC) of the regional vocabulary collecting and researching, which, unfortunately, was not depicted in any academic publications for a number of reasons. About 4 thousand pieces of regional materials were collected, systematized, and became the basis of the typology of regional differences consideration and the concept of a regional norm discussion. Reliability issues and methods of computer-based regional corpus research, including automatic text classification and author profiling, are paid attention to. Along with this article, the "reincarnation" of the LoRC project is also returning to the fund of open lexicographic resources basing on the joint portal for distinctive sociolinguistic research, which includes the General Web-corpus of Russian Language and the interactive dictionary "Languages of Cities and People" (LoC&P)

show abstract

Web-Corpus as a Tool for Linguistic Research: Differentiation, Authorization, Thematic Biases (Or Corpora We Want So Much to Believe)

Belikov¹,

Lab²,

Selegey³

et al. 2020

View full text Add to dashboard Cite

The paper presents the General Internet Corpus of the Russian Language (GICR) as a tool for linguistic research. Problems are identified that are common to any WEB-corpus that affect the reliability of such research. Among the problems considered: the importance of taking into account sociolinguistic variability, the influence of falsely attributed texts, thematic biases, the prospects and disadvantages of new methods for corpora output aggregation. A distinctive feature of our approach is the emphasis on linguistic significance, reliability, and interpretability of the results obtained.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

V. Selegey

Differential Semantic Sketches for Russian Internet-Corpora

Corpus regional lexicography: principles, methods, and preliminary results

Web-Corpus as a Tool for Linguistic Research: Differentiation, Authorization, Thematic Biases (Or Corpora We Want So Much to Believe)

Contact Info

Product

Resources

About