This study examines the validity of newspaper indexes, lead paragraphs, and headlines as representations of full-text media content. We analyze the effects of production decisions on content and categorization in the New York Times Index, based on interviews with its senior editor. We then compare the content of three proxies with that of full-text articles by conducting a parallel content analysis of New York Times stories covering the 1986 Libya crisis and their corresponding Index entries. The study suggests that proxy data can be used to roughly estimate the broad contours of Times coverage but do not reliably represent several key aspects of New York Times reporting. n a recent Workshop article, Woolley (2000) examined the use of media indexes for measuring media attention and counting various types of events. The central concern of Woolley's study was the degree of correspondence between the occurrence of real-world events and media reports of those events, and he found that published indexes of media content suffer from several validity problems when used in event count research (see also White 1993). Another important concern is whether media indexes adequately represent the actual content of the news itself. In this article, we test a common practice employed by political scientists to analyze news content: the use of proxies, such as index entries, headlines, and lead paragraphs, as surrogates for the actual content of the news. Because of time, cost, and access constraints, many researchers code proxies rather than the full content of news texts. Even scholars who ultimately code full text often rely on indirect indicators of news content, such as subject headings in printed indexes and keywords in news databases, to locate that text. Thus, at some level, virtually all content analysis relies on surrogates for full-text content in one form or another. Many important theoretical studies rely on evidence from news proxies, including work on intrastate political conflict (
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.