Studies relating accounting and price data often use the COMPUSTAT or related PDE data base as the source for the accounting data. This practice may introduce a lookahead bias and an ex-post-selection bias into the study. We examine this problem by comparing results from the standard COMPUSTAT data base with those from a data base which suffers from neither bias. We find that rates of return from portfolios chosen on the basis of accounting data from the two data bases differ significantly. Further, we find that these differences imply different conclusions when we test a specific hypothesis relating accounting and price data. Finally, we propose a number of remedies which may reduce the bias when the standard COMPUSTAT data base is used.
THE RELATIONS AMONGTHE economic activities of the firm, the accounting measures of these activities, and the market returns on the debt and the equity of the firm are of central interest to financial economists. Recently, there has been a renewed interest in the empirical relation between market return to equity and basic characteristics of the firm, such as the size and earnings yield of the firm.' Some researchers use the so-called merged COMPUSTAT file for the PDE (price-dividendearnings) file which includes all firms which were on the file at any time during the sample period. This file is obviously not subject to the survivor bias. 779 780 The Journal of Finance The look-ahead bias is due to a dating problem. Data reported for a particular point in time, say at the end of the year, typically are not actually available to the investor until sometime later in the next year. Computing earnings yields with year-end prices and earnings may imply the ability of the investor to forecast future reported earnings without error. For example, the annual COMPUSTAT file reports earnings of $1.24 per share for Zenith for year end 1978. The 12month earnings per share actually observed by the investor as of December 31, 1978 was $0.85 per share. At a December 31, 1978 price of $12.87, the earnings yield computed using the COMPUSTAT data file was 9.6%, whereas the earnings yield using observed data was 6.6%. As might be expected, the price of Zenith stock went from the year-end price of $12.87 to a March ending price (when the new earnings were known to investors) of $15.00.Empirical researchers have long been aware of these potential problems. Until now, there has been no practical way of measuring the size of the biases introduced. Some studies have ignored the problems, others have used various measures designed to reduce the biases, while some have claimed that the biases are of a negligible magnitude.4The purpose of this paper is to examine the effect of the described idiosyncracies of the COMPUSTAT data base using two empirical relations, the "P/E effect" and the "small firm effect" as examples. We show that there are significant differences in returns to portfolios formed using the COMPUSTAT data base and returns to portfolios formed using a data source which does not have the look-...