In this paper, we propose methods to cluster groups of two-dimensional data whose mean functions are piecewise linear into several clusters with common characteristics such as the same slopes. To fit segmented line regression models with common features for each possible cluster, we use a restricted least squares method. In implementing the restricted least squares method, we estimate the maximum number of segments in each cluster by using both the permutation test method and the Bayes Information Criterion (BIC) method, and then propose to use the BIC to determine the number of clusters. For a more effective implementation of the clustering algorithm, we propose a measure of the minimum distance worth detecting, and illustrate its use in two examples. We summarize simulation results to study properties of the proposed methods and also prove the consistency of the cluster grouping estimated with a given number of clusters. The presentation and examples in this paper focus on the segmented line regression model with the ordered values of the independent variable, which has been the model of interest in cancer trend analysis, but the proposed method can be applied to a general model with design points either ordered or un-ordered.
The Schwarz criterion or Bayes Information Criterion (BIC) is often used to select a model dimension, and some variations of the BIC have been proposed in the context of change-point problems. In this paper, we consider a segmented line regression model with an unknown number of change-points and study asymptotic properties of Schwarz type criteria in selecting the number of change-points. Noticing the overestimating tendency of the traditional BIC observed in some empirical studies and being motivated by asymptotic behavior of the modified BIC proposed by Zhang and Siegmund (2007), we consider a variation of the Schwarz type criterion that applies a harsher penalty equivalent to the model with one additional unknown parameter per segment. For the segmented line regression model without the continuity constraint, we prove the consistency of the number of change-points selected by the criterion with such type of a modification and summarize the simulation results that support the consistency. Further simulations are conducted for the model with the continuity constraint, and we empirically observe that the asymptotic behavior of this modified version of BIC is comparable to that of the criterion proposed by Liu, Wu, and Zidek (1997).
This paper studies the asymptotic behavior of the least squares estimators in segmented multiple regression. For a model with more than one partitioning variable, each of which has one or more changepoints, we study the asymptotic properties of the estimated change-points and regression coefficients. Using techniques in empirical process theory, we prove the consistency of the least squares estimators and also establish the asymptotic normality of the estimated regression coefficients. For the estimated change-points, we obtain their consistency at the rates of 1/ √ n or 1/n, with or without continuity constraints, respectively. The change-points estimated under the continuity constraints are also shown to asymptotically have a multivariate normal distribution. For the case where the regression mean functions are not assumed to be continuous at the change-points, the asymptotic distribution of the estimated change-points involves a step function process, whose distribution does not follow a well-known distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.