Social scientists have recently started discussing the utilization of text-mining tools as being fruitful for scaling inductively grounded close reading. We aim to progress in this direction and provide a contemporary contribution to the literature. By focusing on map analysis, we demonstrate the potential of text-mining tools for text analysis that approaches inductive but still formal in-depth analysis. We propose that a combination of text-mining tools addressing different layers of meaning facilitates a closer analysis of the dynamics of manifest and latent meanings than is currently acknowledged. To illustrate our approach, we combine grammatical parsing and topic modeling to operationalize communication structures within sentences and the semantic surroundings of these communication structures. We use a reliable and downloadable software application to analyze the dynamic interlacement of two layers of meaning over time. We do so by analyzing 15,371 newspaper articles on corporate responsibility published in the United States from 1950 to 2013.
Despite the recent and ongoing progress in using text-mining tools to automatically analyze large text corpora, there remains significant potential to facilitate the study of social action in social science research. In this context, particularly the disambiguation (who is referred to in a text?) and specification (which demographic characteristics are present?) of social actors—currently a manual job—remains a challenge. This article demonstrates a reliable and accurate software architecture for social scientists who are interested in automatically detecting, disambiguating, and demographically specifying social actors (i.e., persons and organizations) in large text collections. The backbone of our software architecture is the online encyclopedia Wikipedia as a currently unexploited data source of a large amount of accurately prepared information. We illustrate how our software architecture detects and disambiguates social actors in large text corpora and retrieves their respective demographic information. Overall, we evaluate the reliability and accuracy of our software architecture across seven different social settings and facilitate an intuitive sense of the comprehensive applicability of our software architecture. We end by not only highlighting the benefits of our software architecture for social science research but also pointing to the limitations of using Wikipedia as a data source.
We introduce JOCO, a novel text corpus for NLP analytics in the field of economics, business and management. This corpus is composed of corporate annual and social responsibility reports of the top 30 US, UK and German companies in the major (DJIA, FTSE 100, DAX), middlesized (S&P 500, FTSE 250, MDAX) and technology (NASDAQ, FTSE AIM 100, TECDAX) stock indices, respectively. Altogether, this adds up to 5,000 reports from 270 companies headquartered in three of the world's most important economies. The corpus spans a time frame from 2000 up to 2015 and contains, in total, 282M tokens. We also feature JOCO in a smallscale experiment to demonstrate its potential for NLP-fueled studies in economics, business and management research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.