The spatial scan statistic has been widely used to detect spatial clusters that are of common interest in many health-related problems. However, in most situations, different scan parameters, especially the maximum window size (MWS), result in obtaining different detected clusters. Although performance measures can select an optimal scan parameter, most of them depend on historical prior or true cluster information, which is usually unavailable in practical datasets. Currently, the Gini coefficient and the maximum clustering set-proportion statistic (McS-p) are used to select appropriate parameters without any prior information. However, the Gini coefficient may be unstable and select inappropriate parameters, especially in complex practical datasets, while the MCS-P may have unsatisfactory performance in spatial datasets with heterogeneous clusters. Based on the MCS-P, we proposed a new indicator, the maximum clustering heterogeneous set-proportion (MCHS-P). A simulation study of selecting the optimal MWS confirmed that in spatial datasets with heterogeneous clusters, the MWSs selected using the MCHS-P have much better performance than those selected using the MCS-P; moreover, higher heterogeneity led to a larger advantage of the MCHS-P, with up to 538% and 69.5% improvement in the Youden's index and misclassification in specific scenarios, respectively. Meanwhile, the MCHS-P maintains similar performance to that of the MCS-P in spatial datasets with homogeneous clusters. Furthermore, the MCHS-P has significant improvements over the Gini coefficient and the default 50% MWS, especially in datasets with clusters that are not far from each other. Two practical studies showed similar results to those obtained in the simulation study. In the case where there is no prior information about the true clusters or the heterogeneity between the clusters, the MCHS-P is recommended to select the MWS in order to accurately identify spatial clusters.
Most spatial models include a spatial weights matrix (W) derived from the first law of geography to adjust the spatial dependence to fulfill the independence assumption. In various fields such as epidemiological and environmental studies, the spatial dependence often shows clustering (or geographic discontinuity) due to natural or social factors. In such cases, adjustment using the first‐law‐of‐geography‐based W might be inappropriate and leads to inaccuracy estimations and loss of statistical power. In this work, we propose a series of data‐driven Ws (DDWs) built following the spatial pattern identified by the scan statistic, which can be easily carried out using existing tools such as SaTScan software. The DDWs take both the clustering (or discontinuous) and the intuitive first‐law‐of‐geographic‐based spatial dependence into consideration. Aiming at two common purposes in epidemiology studies (ie, estimating the effect value of explanatory variable X and estimating the risk of each spatial unit in disease mapping), the common spatial autoregressive models and the Leroux‐prior‐based conditional autoregressive (CAR) models were selected to evaluate performance of DDWs, respectively. Both simulation and case studies show that our DDWs achieve considerably better performance than the classic W in datasets with clustering (or discontinuous) spatial dependence. Furthermore, the latest published density‐based spatial clustering models, aiming at dealing with such clustering (or discontinuity) spatial dependence in disease mapping, were also compared as references. The DDWs, incorporated into the CAR models, still show considerable advantage, especially in the datasets for common diseases.
We developed a novel method to address multicollinearity in linear models called average ordinary least squares (OLS)-centered penalized regression (AOPR). AOPR penalizes the cost function to shrink the estimators toward the weighted-average OLS estimator. The commonly used ridge regression (RR) shrinks the estimators toward zero, that is, employs penalization prior 𝛽 ∼ N(0, 1∕k) in the Bayesian view, which contradicts the common real prior 𝛽 ≠ 0. Therefore, RR selects small penalization coefficients to relieve such a contradiction and thus makes the penalizations inadequate. Mathematical derivations remind us that AOPR could increase the performance of RR and OLS regression. A simulation study shows that AOPR obtains more accurate estimators than OLS regression in most situations and more accurate estimators than RR when the signs of the true 𝛽s are identical and is slightly less accurate than RR when the signs of the true 𝛽s are different. Additionally, a case study shows that AOPR obtains more stable estimators and stronger statistical power and predictive ability than RR and OLS regression. Through these results, we recommend using AOPR to address multicollinearity more efficiently than RR and OLS
The Poisson ridge estimator (PRE) is a commonly used parameter estimation method to address multicollinearity in Poisson regression (PR). However, PRE shrinks the parameters toward zero, contradicting the real association. In such cases, PRE tends to become an insufficient solution for multicollinearity. In this work, we proposed a new estimator called the Poisson average maximum likelihood‐centered penalized estimator (PAMLPE), which shrinks the parameters toward the weighted average of the maximum likelihood estimators. We conducted a simulation study and case study to compare PAMLPE with existing estimators in terms of mean squared error (MSE) and predictive mean squared error (PMSE). These results suggest that PAMLPE can obtain smaller MSE and PMSE (i.e., more accurate estimates) than the Poisson ridge estimator, Poisson Liu estimator, and Poisson K‐L estimator when the true s have the same sign and small variation. Therefore, we recommend using PAMLPE to address multicollinearity in PR when the signs of the true s are known to be identical in advance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.