The subject of the study is a methodology for analyzing the electronic content of social networks (forums) as a historical source. The discussion of the revolution of 1917 during the centenary of this historical event was used as a material for analysis. The aim of the study was to test approaches to the methodology of working with large arrays of online texts, and the possible combination of two approaches to working with online texts - quantitative analysis tools (distant reading) and traditional methods of working with historical text (slow reading). As part of the "distant reading", thematic modeling is used using the LDA (latent Dirichlet placement) and LSA (latent semantic analysis) algorithm in the R programming environment in the R studio program (version 4.2.1). During the "slow reading" we analyze the entire volume of the text directly.The novelty of the research lies in the application of thematic modeling to sources in the R programming environment in conjunction with classical methods of analyzing historical texts. Within the framework of the study, a methodology for analyzing the content of social networks (forums) has been tested, focused on substantial arrays of text that are physically impossible to read in full or at least in a significant part, using exclusively traditional means of interaction of the researcher with the corpus of sources. A step-by-step research algorithm is proposed, in which the researcher needs to analyze the text by "distant reading" methods, identifying the topics of texts consisting of terms (words). Then, using these keywords, you should find the relevant text fragments in which the identified topic was discussed most actively, and analyze the fragments in more detail using traditional methods of working with a text source. A possible way to improve the quality of identifying topics necessary for the researcher in social networks and forums by the LDA algorithm is proposed, namely, preliminary splitting of a large text and subsequent analysis of fragments by the LDA method as separate documents.
В статье предложена и апробирована методика приведения бюджета Российской империи в 1803-1913 гг. к единым, сопоставимым показателям. Был получен реконструированный длинный временной ряд за 111 лет по совокупной доходно-расходной части государственного бюджета. На примере обыкновенных расходов страны было проиллюстрировано и объяснено расхождение между номинальными и реальными данными. В заключении обозначены перспективы практического применения методики. Реконструированный и номинальный ряд по бюджету доступен в Интернете в виде файла программы MS Excel для использования в научных и образовательных целях всеми заинтересованными лицами. Адрес статьи: www.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.