Peter Miltner scite author profile

Latent Dirichlet allocation (LDA) topic models are increasingly being used in communication research. Yet, questions regarding reliability and validity of the approach have received little attention thus far. In applying LDA to textual data, researchers need to tackle at least four major challenges that affect these criteria: (a) appropriate pre-processing of the text collection; (b) adequate selection of model parameters, including the number of topics to be generated; (c) evaluation of the model's reliability; and (d) the process of validly interpreting the resulting topics. We review the research literature dealing with these questions and propose a methodology that approaches these challenges. Our overall goal is to make LDA topic modeling more accessible to communication researchers and to ensure compliance with disciplinary standards. Consequently, we develop a brief hands-on user guide for applying LDA topic modeling. We demonstrate the value of our approach with empirical data from an ongoing research project.

show abstract

Exploring Issues in a Networked Public Sphere

Maier

Waldherr

Miltner

et al. 2017

Social Science Computer Review

View full text Add to dashboard Cite

We propose a methodological approach to analyze the content of hyperlink networks which represent networked public spheres on the Internet. Using the case of the food safety movement in the United States, we demonstrate how to generate a hyperlink network with the web crawling tool Issue Crawler and merge it with the results of a probabilistic topic model of the network’s content. Combining hyperlink networks and content analysis allows us to interpret such a network in its entirety and with regard to the mobilizing potentials of specific sub-issues of the movement. We focus on two specific sub-issues in the food safety network, genetically modified food and food control, in order to trace the involved websites and their interlinking structures, respectively.

show abstract

Homophily and prestige: An assessment of their relative strength to explain link formation in the online climate change debate

et al. 2018

View full text Add to dashboard Cite

Previous work has shown that hyperlinks reflect actors' strategic choices; these dyadic relationships depend on the actors' exogenous attributes (e.g., homophily) and the network's endogenous features (e.g., prestige distribution among actors). We combine these factors as explanatory variables in different exponential random graph models (ERGMs) to assess the relative strength of prestige and homophily for the actors' link formation. We analyze the climate change discourse in a hyperlink network formed by US civil society actors from November 2014 and test how relevant the different factors are, including variables such as actor type, country, position, and topic. We find that both prestige and various aspects of homophily influence link formation online. With regard to the importance of the different factors, positional homophily stands out, followed by prestige and other homophily effects.

show abstract

Big Data, Big Noise

Waldherr

Maier

Miltner

et al. 2016

Social Science Computer Review

View full text Add to dashboard Cite

In this article, we focus on noise in the sense of irrelevant information in a data set as a specific methodological challenge of web research in the era of big data. We empirically evaluate several methods for filtering hyperlink networks in order to reconstruct networks that contain only webpages that deal with a particular issue. The test corpus of webpages was collected from hyperlink networks on the issue of food safety in the United States and Germany. We applied three filtering strategies and evaluated their performance to exclude irrelevant content from the networks: keyword filtering, automated document classification with a machine-learning algorithm, and extraction of core networks with network-analytical measures. Keyword filtering and automated classification of webpages were the most effective methods for reducing noise, whereas extracting a core network did not yield satisfying results for this case.

show abstract

Big Data, Big Noise: The Challenge of Finding Issue Networks on the Web

Waldherr¹,

Maier²,

Miltner³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we focus on noise in the sense of irrelevant information in a data set as a specific methodological challenge of web research in the era of big data. We empirically evaluate several methods for filtering hyperlink networks in order to reconstruct networks that contain only web pages that deal with a particular issue. The test corpus of web pages was collected from hyperlink networks on the issue of food safety in the United States and Germany. We applied three filtering strategies and evaluated their performance to exclude irrelevant content from the networks: keyword filtering, automated document classification with a machine-learning algorithm, and extraction of core networks with network-analytical measures. Keyword filtering and automated classification of web pages were the most effective methods for reducing noise whereas extracting a core network did not yield satisfying results for this case.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Peter Miltner

Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology

Exploring Issues in a Networked Public Sphere

Homophily and prestige: An assessment of their relative strength to explain link formation in the online climate change debate

Big Data, Big Noise

Big Data, Big Noise: The Challenge of Finding Issue Networks on the Web

Contact Info

Product

Resources

About