Summary Many contemporary large‐scale applications involve building interpretable models linking a large set of potential covariates to a response in a non‐linear fashion, such as when the response is binary. Although this modelling problem has been extensively studied, it remains unclear how to control the fraction of false discoveries effectively even in high dimensional logistic regression, not to mention general high dimensional non‐linear models. To address such a practical problem, we propose a new framework of ‘model‐X’ knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models. Whereas the knockoffs procedure is constrained to homoscedastic linear models with n⩾p, the key innovation here is that model‐X knockoffs provide valid inference from finite samples in settings in which the conditional distribution of the response is arbitrary and completely unknown. Furthermore, this holds no matter the number of covariates. Correct inference in such a broad setting is achieved by constructing knockoff variables probabilistically instead of geometrically. To do this, our approach requires that the covariates are random (independent and identically distributed rows) with a distribution that is known, although we provide preliminary experimental evidence that our procedure is robust to unknown or estimated distributions. To our knowledge, no other procedure solves the controlled variable selection problem in such generality but, in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case–control study of Crohn's disease in the UK, making twice as many discoveries as the original analysis of the same data.
In this paper we present a novel probabilistic sampling-based motion planning algorithm called the Fast Marching Tree algorithm (FMT*). The algorithm is specifically aimed at solving complex motion planning problems in high-dimensional configuration spaces. This algorithm is proven to be asymptotically optimal and is shown to converge to an optimal solution faster than its state-of-the-art counterparts, chiefly PRM* and RRT*. The FMT* algorithm performs a “lazy” dynamic programming recursion on a predetermined number of probabilistically-drawn samples to grow a tree of paths, which moves steadily outward in cost-to-arrive space. As such, this algorithm combines features of both single-query algorithms (chiefly RRT) and multiple-query algorithms (chiefly PRM), and is reminiscent of the Fast Marching Method for the solution of Eikonal equations. As a departure from previous analysis approaches that are based on the notion of almost sure convergence, the FMT* algorithm is analyzed under the notion of convergence in probability: the extra mathematical flexibility of this approach allows for convergence rate bounds—the first in the field of optimal sampling-based motion planning. Specifically, for a certain selection of tuning parameters and configuration spaces, we obtain a convergence rate bound of order O(n−1/d+ρ), where n is the number of sampled points, d is the dimension of the configuration space, and ρ is an arbitrarily small constant. We go on to demonstrate asymptotic optimality for a number of variations on FMT*, namely when the configuration space is sampled non-uniformly, when the cost is not arc length, and when connections are made based on the number of nearest neighbors instead of a fixed connection radius. Numerical experiments over a range of dimensions and obstacle configurations confirm our the-oretical and heuristic arguments by showing that FMT*, for a given execution time, returns substantially better solutions than either PRM* or RRT*, especially in high-dimensional configuration spaces and in scenarios where collision-checking is expensive.
This article presents a novel approach, named MCMP (Monte Carlo Motion Planning), to the problem of motion planning under uncertainty, i.e., to the problem of computing a low-cost path that fulfills probabilistic collision avoidance constraints. MCMP estimates the collision probability (CP) of a given path by sampling via Monte Carlo the execution of a reference tracking controller (in this paper we consider LQG). The key algorithmic contribution of this paper is the design of statistical variance-reduction techniques, namely control variates and importance sampling, to make such a sampling procedure amenable to real-time implementation. MCMP applies this CP estimation procedure to motion planning by iteratively (i) computing an (approximately) optimal path for the deterministic version of the problem (here, using the FMT * algorithm), (ii) computing the CP of this path, and (iii) inflating or deflating the obstacles by a common factor depending on whether the CP is higher or lower than a target value. The advantages of MCMP are threefold: (i) asymptotic correctness of CP estimation, as opposed to most current approximations, which, as shown in this paper, can be off by large multiples and hinder the computation of feasible plans; (ii) speed and parallelizability, and (iii) generality, i.e., the approach is applicable to virtually any planning problem provided that a path tracking controller and a notion of distance to obstacles in the configuration space are available. Numerical results illustrate the correctness (in terms of feasibility), efficiency (in terms of path cost), and computational speed of MCMP.
Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional (p > n) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied problem of performing inference on the ℓ2-norm of the signal in high-dimensional linear regression. We derive a novel procedure for this, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well. The procedure, called EigenPrism, is computationally fast and makes no assumptions on coefficient sparsity or knowledge of the noise level. We investigate the width of the EigenPrism confidence intervals, including a comparison with a Bayesian setting in which our interval is just 5% wider than the Bayes credible interval. We are then able to unify the three aforementioned problems by showing that the EigenPrism procedure with only minor modifications is able to make important contributions to all three. We also investigate the robustness of coverage and find that the method applies in practice and in finite samples much more widely than just the case of multivariate Gaussian covariates. Finally, we apply EigenPrism to a genetic dataset to estimate the genetic signal-to-noise ratio for a number of continuous phenotypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.