This paper is concerned with inference on the cumulative distribution function (cdf) FX * in the classical measurement error model X = X * +. We show validity of asymptotic and bootstrap approximations for the distribution of the deviation in the sup-norm between the deconvolution cdf estimator of Hall and Lahiri (2008) and FX *. We allow the density of to be ordinary or super smooth, or to be estimated by repeated measurements. Our approximation results are applicable to various contexts, such as confidence bands for FX * and its quantiles, and for performing various cdf-based tests such as goodness-of-fit tests for parametric models of densities, two sample homogeneity tests, and tests for stochastic dominance. Simulation and real data examples illustrate satisfactory performance of the proposed methods.
This paper considers nonparametric instrumental variable regression when the endogenous variable is contaminated with classical measurement error. Existing methods are inconsistent in the presence of measurement error. We propose a wavelet deconvolution estimator for the structural function that modifies the generalized Fourier coefficients of the orthogonal series estimator to take into account the measurement error. We establish the convergence rates of our estimator for the cases of mildly/severely ill-posed models and ordinary/super smooth measurement errors. We characterize how the presence of measurement error slows down the convergence rates of the estimator. We also study the case where the measurement error density is unknown and needs to be estimated, and show that the estimation error of the measurement error density is negligible under mild conditions as far as the measurement error density is symmetric.
Devising guidance on how to assign individuals to treatment is an important goal of empirical research. In practice individuals often arrive sequentially, and the planner faces various constraints such as limited budget/capacity, or borrowing constraints, or the need to place people in a queue. For instance, a governmental body may receive a budget outlay at the beginning of an year, and it may need to decide how best to allocate resources within the year to individuals who arrive sequentially. In this and other examples involving inter-temporal tradeoffs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. Here we show how one can use offline observational data to estimate an optimal policy rule that maximizes ex-ante expected welfare in this dynamic context. We allow the class of policy rules to be restricted for computational, legal or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class, and we exploit recent developments in Reinforcement Learning (RL) to propose an algorithm to solve this. The algorithm is easily implementable and computationally efficient, with speedups achieved through multiple RL agents learning in parallel processes. We also characterize the statistical regret from using our estimated policy rule. To do this, we show that a Partial Differential Equation (PDE) characterizes the evolution of the value function under each policy.The data enables us to obtain a sample version of the PDE that provides estimates of these value functions. The estimated policy rule is the one with the maximal estimated value function.Using the theory of viscosity solutions to PDEs we show that the policy regret decays at a n −1/2 rate in most examples; this is the same rate as that obtained in the static case.
This paper provides a decision theoretic analysis of bandit experiments. The bandit setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit settings. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit setting with a PDE which can be efficiently solved using sparse matrix routines. We derive near-optimal policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.