Single-case deletion regression diagnostics have been used widely to discover unusual data points, but such approaches can fail in the presence of multiple unusual data points and as a result of masking. We propose a new approach to the use of single-case deletion diagnostics that involves applying these diagnostics to delete-2 and delete-3 jackknife replicates of the data, and considering the percentage of times among these replicates that points are flagged as unusual as an indicator of their influence. By considering replicates that exclude certain collections of points, subtle masking effects can be uncovered.
Critical to any regression analysis is the identification of observations that exert a strong influence on the fitted regression model. Traditional regression influence statistics such as Cook's distance and DFFITS, each based on deleting single observations, can fail in the presence of multiple influential observations if these influential observations "mask" one another, or if other effects such as "swamping" occur. Masking refers to the situation where an observation reveals itself as influential only after one or more other observations are deleted. Swamping occurs when points that are not actually outliers/influential are declared to be so because of the effects on the model of other unusual observations. One computationally expensive solution to these problems is the use of influence statistics that delete multiple rather than single observations. In this article, we build on previous work to produce a computationally feasible algorithm for detecting an unknown number of influential observations in the presence of masking. An important difference between our proposed algorithm and existing methods is that we focus on the data that remain after observations are deleted, rather than on the deleted observations themselves. Further, our approach uses a novel confirmatory step designed to provide a secondary assessment of identified observations. Supplementary materials for this article are available online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.