Sergei Pashakhin scite author profile

Sergei Pashakhin

5Publications

11Citation Statements Received

97Citation Statements Given

How they've been cited

How they cite others

138

Affiliations

University of Bamberg, National Research University Higher School of Economics

Publications

Order By: Most citations

Agenda divergence in a developing conflict: Quantitative evidence from Ukrainian and Russian TV newsfeeds

Koltsova

Pashakhin

2019

Media, War & Conflict

View full text Add to dashboard Cite

Although conflict representation in media has been widely studied, few attempts have been made to perform large-scale comparisons of agendas in the media of conflicting parties, especially for armed country-level confrontations. In this article, the authors introduce quantitative evidence of agenda divergence between the media of conflicting parties in the course of the Ukrainian crisis 2013–2014. Using 45,000 messages from the online newsfeeds of a Russian and a Ukrainian TV channel, they perform topic modelling coupled with qualitative analysis to reveal crisis-related topics, assess their salience and map evolution of attention of both channels to each of those topics. They find that the two channels produce fundamentally different agenda sequences. Based on the Ukrainian case, they offer a typology of conflict media coverage stages.

show abstract

PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media

Koltsova

Alexeeva

Pashakhin

et al. 2020

View full text Add to dashboard Cite

Automatic assessment of sentiment in large text corpora is an important goal in social sciences. This paper describes a methodology and the results of the development of a system for Russian language sentiment analysis that includes: a publicly available sentiment lexicon, a publicly available test collection with sentiment markup and a crowdsourcing website for such markup. The lexicon is aimed at detecting sentiment in user-generated content (blogs, social media) related to social and political issues. Its prototype was formed based on other dictionaries and on the topic modeling performed on a large collection of blog posts. Topic modeling revealed relevant (social and political) topics and as a result-relevant words for the lexicon prototype and relevant texts for the training collection. Each word was assessed by at least three volunteers in the context of three different texts where the word occurred while the texts received their sentiment scores from the same volunteers as well. Both texts and words were scored from −2 (negative) to +2 (positive). Of 7,546 candidate words, 2,753 got non-neutral sentiment scores. The quality of the lexicon was assessed with SentiStrength software by comparing human text scores with the scores obtained automatically based on the created lexicon. 93% of texts were classified correctly at the error level of ±1 class, which closely matches the result of SentiStrength initial application to the English language tweets. Negative classes were much larger and better predicted. The lexicon and the text collection are publicly available at http://linis-crowd.org.

show abstract

Fast Tuning of Topic Models: An Application of Rényi Entropy and Renormalization Theory

Koltcov

Ignatenko

Pashakhin

2019

View full text Add to dashboard Cite

In practice, the critical step in building machine learning models of big data (BD) is costly in terms of time and the computing resources procedure of parameter tuning with a grid search. Due to the size, BD are comparable to mesoscopic physical systems. Hence, methods of statistical physics could be applied to BD. The paper shows that topic modeling demonstrates self-similar behavior under the condition of a varying number of clusters. Such behavior allows using a renormalization technique. The combination of a renormalization procedure with the Rényi entropy approach allows for fast searching of the optimal number of clusters. In this paper, the renormalization procedure is developed for the Latent Dirichlet Allocation (LDA) model with a variational Expectation-Maximization algorithm. The experiments were conducted on two document collections with a known number of clusters in two languages. The paper presents results for three versions of the renormalization procedure: (1) a renormalization with the random merging of clusters, (2) a renormalization based on minimal values of Kullback-Leibler divergence and (3) a renormalization with merging clusters with minimal values of Rényi entropy. The paper shows that the renormalization procedure allows finding the optimal number of topics 26 times faster than grid search without significant loss of quality.

show abstract

A Full-Cycle Methodology for News Topic Modeling and User Feedback Research

Koltсov

Pashakhin

Dokuka

2018

View full text Add to dashboard Cite

How Many Clusters? An Entropic Approach to Hierarchical Cluster Analysis

Koltcov

Ignatenko

Pashakhin

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.