Research Summary
In this article, we examine the relationship between corporate diversification and firm performance using a machine learning technique called natural language processing (NLP). By applying a widely used NLP technique called topic modeling to unstructured text from annual reports, we create a new, multidimensional measure that captures the degree of diversification of both multisegment and single‐segment firms. Additionally, we introduce a novel method to incorporate human judgments into the interpretation of machine‐learned patterns, which allows us to measure diversification across multiple dimensions, such as products and geographies. Finally, we illustrate how these new measures can generate novel insights into the relationship between the degree and type of diversification and firm performance, furthering our understanding of the diversification–performance relationship.
Managerial Summary
At some point, most firms face dilemmas about whether to diversify their business activities across industries or geographic markets—an important decision that invariably affects firm performance. Albeit very important, the direction of a relationship between diversification and firm performance is not always clear. Inconsistent results of previous studies are partially driven by inherent difficulties in reliably measuring diversification. This study introduces a novel methodology to address that problem: a machine learning‐based technique to quantify diversification from unstructured corporate annual report texts. An analysis of firm performance based on these novel diversification measures suggests that diversification, in contrast to earlier studies that find a diversification discount, is associated with higher firm value—a premium particularly pronounced for firms diversifying within a single industry.