The Davis-Kahan theorem is used in the analysis of many statistical procedures to bound the distance between subspaces spanned by population eigenvectors and their sample versions. It relies on an eigenvalue separation condition between certain relevant population and sample eigenvalues. We present a variant of this result that depends only on a population eigenvalue separation condition, making it more natural and convenient for direct application in statistical contexts, and improving the bounds in some cases. We also provide an extension to situations where the matrices under study may be asymmetric or even non-square, and where interest is in the distance between subspaces spanned by corresponding singular vectors.
Summary. Change points are a very common feature of 'big data' that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called inspect for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package InspectChangepoint.
In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for highdimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.Tribute: Peter was a remarkable person: not only a prolific and highly influential researcher, but also someone with a wonderful warmth and generosity of spirit. He was a great inspiration to so many statisticians around the world. We are deeply saddened that he is no longer with us, and dedicate this paper to his memory. Further personal reflections on Peter Hall's life and work from the third author can be found in Samworth (2016).
We study the least squares regression function estimator over the class of real-valued functions on [0, 1] d that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order n − min{2/(d+2),1/d} in the empirical L 2 loss, up to poly-logarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on k hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of (k/n) min(1,2/d) , again up to poly-logarithmic factors. Previous results are confined to the case d ≤ 2. Finally, we establish corresponding bounds (which are new even in the case d = 2) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to poly-logarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate.
Toxoplasmosis, caused by the protozoan parasite Toxoplasma gondii, is one of the most common parasitic infections in humans. Primary infection in pregnant women can be transmitted to the fetus leading to miscarriage or congenital toxoplasmosis. Carefully designed nationwide seroprevalence surveys and case-control studies of risk factors conducted primarily in Europe and America, have shaped our view of the global status of maternal and congenital infection, directing approaches to disease prevention. However, despite encompassing 1 in 5 of the world's population, information is limited on the status of toxoplasmosis in China, partly due to the linguistic inaccessibility of the Chinese literature to the global scientific community. By selection and analysis of studies and data, reported within the last 2 decades in China, this review summarizes and renders accessible a large body of Chinese and other literature and aims to estimate the seroprevalence in Chinese pregnant women. It also reviews the prevalence trends, risk factors, and clinical manifestations. The key findings are (1) the majority of studies show that the overall seroprevalence in Chinese pregnant women is less than 10%, considerably lower than a recently published global analysis; and (2) the few available appropriate studies on maternal acute infection suggested an incidence of 0·3% which is broadly comparable to studies from other countries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.