In recent years, there has been an increasing demand on efficient algorithms for large scale change point detection problems. To this end, we propose seeded binary segmentation, an approach relying on a deterministic construction of background intervals, called seeded intervals, in which single change points are searched. The final selection of change points based on the candidates from seeded intervals can be done in various ways, adapted to the problem at hand. Thus, seeded binary segmentation is easy to adapt to a wide range of change point detection problems, let that be univariate, multivariate or even high-dimensional.We consider the univariate Gaussian change in mean setup in detail. For this specific case we show that seeded binary segmentation leads to a near-linear time approach (i.e. linear up to a logarithmic factor) independent of the underlying number of change points. Furthermore, using appropriate selection methods, the methodology is shown to be asymptotically minimax optimal. While computationally more efficient, the finite sample estimation performance remains competitive compared to state of the art procedures. Moreover, we illustrate the methodology for high-dimensional settings with an inverse covariance change point detection problem where our proposal leads to massive computational gains while still exhibiting good statistical performance.
The most essential requirement for water management is efficient and informative monitoring. Operating water quality monitoring networks is a challenge from both the scientific and economic points of view, especially in the case of river sections ranging over hundreds of kilometers. Therefore, spatio-temporal optimization is vital. In the present study, the optimization of the monitoring system of the River Tisza, the second largest river in Central Europe, is presented using a generally applicable and novel method, combined cluster and discriminant analysis (CCDA). This area for the study was chosen because, spatial inhomogeneity of a river's monitoring network can more easily be studied in a mostly natural watershed - as in the case of the River Tisza - since the effects of man-made obstacles: e.g water barrage systems, hydroelectric power plants, artificial lakes, etc. are more pronounced. Furthermore, since the temporal sampling frequency was bi-weekly, the opportunity of optimizing the monitoring system on a temporal (monthly) scale arose. In the research, 15 water quality parameters measured at 14 sampling sites in the Hungarian section of the River Tisza were assessed for the time period 1975-2005. First, four within-year sections ("hydrochemical seasons") were determined, characterized with unequal lengths, namely 2, 4, 2, and 4 months long starting with spring. Homogeneous groups of sampling sites were determined in space for every season, with the main separating factors being the tributaries and man-made obstacles. Similarly, an overall pattern of homogeneity was determined. As an overall result, the 14 sampling sites could be grouped into 11 homogeneous groups leading to the possibility of reducing the number of sampling locations and thus making the monitoring system more cost-efficient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.