Probabilistic topic models have proven to be an extremely versatile class of mixed-membership models for discovering the thematic structure of text collections. There are many possible applications, covering a broad range of areas of study: technology, natural science, social science and the humanities.In this thesis, a new efficient parallel Markov Chain Monte Carlo inference algorithm is proposed for Bayesian inference in large topic models. The proposed methods scale well with the corpus size and can be used for other probabilistic topic models and other natural language processing applications. The proposed methods are fast, efficient, scalable, and will converge to the true posterior distribution.In addition, in this thesis a supervised topic model for high-dimensional text classification is also proposed, with emphasis on interpretable document prediction using the horseshoe shrinkage prior in supervised topic models.Finally, we develop a model and inference algorithm that can model agenda and framing of political speeches over time with a priori defined topics. We apply the approach to analyze the evolution of immigration discourse in the Swedish parliament by combining theory from political science and communication science with a probabilistic topic model.
iv
AcknowledgmentsThere are many people that I need to thank for their direct and indirect contributions to this thesis. People who have given their support and personal contributions, and also some that just put up with me through these five, very intensive, years.First and foremost I want to thank my main supervisor Mattias Villani. It has been a privilege to be his student and I really want to thank him for all the ideas, time, and effort he put into me throughout the years. He has always pushed me to go further, accepting nothing less than high quality research from me. But he also helped me focus on the right things when so many exciting research projects were possible.My co-supervisor Marco Kuhlmann has also been important during these years, helping me through the difficulties of Natural language processing and computational linguistics. Marco's advice and counseling has been invaluable to me.I am also very grateful to David Mimno, who welcomed me to Cornell University and acted as my supervisor during the fall 2016. Doing research at Cornell for one semester really helped me to get different perspectives on the latent semantic analysis research field. The way I try to present the different parts of latent semantic analysis in this thesis is heavily influenced by discussions with David and David's course on advanced topic models.The most important part of my graduates studies has been learning to be a researcher. I entered graduate school, knowing very little about how to do statistical research, especially in the field of probabilistic text modeling and natural language processing. But thanks to my many collaborators I now feel like I can actually do real research. My research collaborators on different projects have been extremely imp...