Missing not at random (MNAR) data pose key challenges for statistical inference because the substantive model of interest is typically not identifiable without imposing further (eg, distributional) assumptions. Selection models have been routinely used for handling MNAR by jointly modeling the outcome and selection variables and typically assuming that these follow a bivariate normal distribution. Recent studies have advocated parametric selection approaches, for example, estimated by multiple imputation and maximum likelihood, that are more robust to departures from the normality assumption compared with those assuming that nonresponse and outcome are jointly normally distributed.However, the proposed methods have been mostly restricted to a specific joint distribution (eg, bivariate t-distribution). This paper discusses a flexible copula-based selection approach (which accommodates a wide range of non-Gaussian outcome distributions and offers great flexibility in the choice of functional form specifications for both the outcome and selection equations) and proposes a flexible imputation procedure that generates plausible imputed values from the copula selection model. A simulation study characterizes the relative performance of the copula model compared with the most commonly used selection models for estimating average treatment effects with MNAR data.We illustrate the methods in the REFLUX study, which evaluates the effect of laparoscopic surgery on long-term quality of life in patients with reflux disease.We provide software code for implementing the proposed copula framework using the R package GJRM. In such cases, the data are said to be missing not at random (MNAR), and nonresponse must be modeled together with the substantive model for the observed data.Selection models have been commonly used to handle MNAR data in clinical and epidemiological research, 4-6 by jointly modeling the outcome and missingness models and typically assuming bivariate normality. Heckman 7 was one of the first to propose such model (Heckman selection model) using a simultaneous equation approach, where the error terms were assumed to follow a bivariate Gaussian. At that time, to circumvent some of the difficulties associated with direct likelihood maximization, Heckman proposed a two-stage least-squares estimation procedure. 8 This involved combining a probit model for the probability of observing the outcome (first stage) with a linear regression model for the outcome (second stage), which was a function of the estimates obtained in the first stage. Although this approach (method of moments estimator) is somehow robust to deviations from normality, it relies crucially on the availability of exclusion restrictions, ie, variables that predict missingness but are unrelated to the outcome of interest. 9 An alternative approach to estimating selection models is the single-step full-information maximum likelihood (FIML), which jointly estimates the outcome and missing data equations. For example, Diggle and Kenward 10 combined a marg...