Research Summary In this article, we examine the relationship between corporate diversification and firm performance using a machine learning technique called natural language processing (NLP). By applying a widely used NLP technique called topic modeling to unstructured text from annual reports, we create a new, multidimensional measure that captures the degree of diversification of both multisegment and single‐segment firms. Additionally, we introduce a novel method to incorporate human judgments into the interpretation of machine‐learned patterns, which allows us to measure diversification across multiple dimensions, such as products and geographies. Finally, we illustrate how these new measures can generate novel insights into the relationship between the degree and type of diversification and firm performance, furthering our understanding of the diversification–performance relationship. Managerial Summary At some point, most firms face dilemmas about whether to diversify their business activities across industries or geographic markets—an important decision that invariably affects firm performance. Albeit very important, the direction of a relationship between diversification and firm performance is not always clear. Inconsistent results of previous studies are partially driven by inherent difficulties in reliably measuring diversification. This study introduces a novel methodology to address that problem: a machine learning‐based technique to quantify diversification from unstructured corporate annual report texts. An analysis of firm performance based on these novel diversification measures suggests that diversification, in contrast to earlier studies that find a diversification discount, is associated with higher firm value—a premium particularly pronounced for firms diversifying within a single industry.
This study explores how new text analysis tools can be used in strategic management research that examines unstructured textual data. We build on two established natural language processing (NLP) techniques, vector space models and topic modeling, to create text-based measures of several core constructs in strategy -namely strategic change, positioning, and focus.These techniques are applied to the entire sample of 52,392 business descriptions in 10-K annual reports from 1996 to 2016. Results show that these new methods produce innovative yet meaningful measures of firm strategy which open up previously unexplored avenues of research to strategy scholars. The study advances emerging strategy research utilizing text analysis methods, demonstrates that NLP techniques can overcome some of the limitations of traditional text analysis methods such as keyword counts and mapping analysis, and provides a template for how other machine learning techniques could be introduced into strategy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.