Various measureshave been proposed for detecting the influential observations in linear regression model but most of them require a large amount of computing time in practical problems. This article concerns an efficient computational method for computing the influential measure, CDm (X'(Dm)X(Dm), pS2(Dm)), proposed by Cook and Weisberg (1982).
IntroductionIn the statistical literature on regression analysis, much attention has been given to problems of detecting observations which, individually or jointly, exert a disproportionate influence on the outcome of linear regression analysis and to problems of assessing the influence of such cases. Most approaches are ways of measuring the change in some feature of analysis on the deletion of one or more observations. Various measures have been proposed which emphasize different aspects of influence on the linear regression. Gentleman and Wilk (1975), Cook (1977Cook ( , 1979, Belsley, Kith and Welsch (1980), Cook andWeisberg (1980, 1982), and Chatterjee and Hadi (1986) gave fine reviews of such measures. Among those measures many researchers have concentrated on the topic of reducing the computation time for Cook's Distance. For a review of such methods, see Cook and Weisberg (1980), Gray and Ling(1984), and Barrett and Gray (1992).Our choice gives rise to a suggestion for an efficient computational method for the measure, CDm (X'(Dm)X(Dm ), pS2(Dm )), proposed by Cook and Weisberg (1982). Barrett and Ling proposed the method that computes CD m (X' (Dm)X(Dm ), pS2 (Dm )) by operating matrices case by case.To continue with the article, let us consider the modelwhere Y is an n x 1 vector of observations, X is an n x p full rank matrix of known constants, /Q is a p x 1 vector of unknown parameters, and c is an n x 1 vector of independent random variables each with mean of zero and variance of u2. It is well known that the least squares estimator of B is that Dm is an index set with m elements in {1, 2,.. . , n}. For the matrices associated with (1.1), let XDm be the submatrix of X, whose m rows are indexed by Dm, and let X (Dm) be its complement, the submatrix of X with X Dm deleted. The vectors Y(Dm ), ~(Dm) and EDm are similarly defined in Y, c and E, respectively. If an integer, i, is substituted for Dm, then x= represents the ith row in X and X(i) is the submatrix of X with ith row deleted. For the quantities calculated from the data, we use the notation (Dm) to indicate that the cases indexed by Dm have been deleted prior to calculation. For example, will mainly focus on CDm (X'(Dm )X(Dm), pS2(Dm )). Belsley et al. (1980) proposed an influential measure, MDFEIT, which is the special case of CDm (M, a) with M = X'(Dm)X(Dm) and a =1, that is, MDFFIT{Dm.} . (B -B(Dm))'X'(Dm.)X(Dm,)(B -B(Dm)).(1.11)Also they suggested a computational method, the triangular factorization. Jones and Ling (1988) showed that MDFFIT(Dm) and S2(Dm) can be represented as Th e above two computational methods are considered for a given subset Dm case by case and therefore cannot use the...