The Youden index is a popular summary statistic for receiver operating characteristic curves. It gives the optimal cut‐off point of a biomarker to distinguish the diseased and healthy individuals. In this article, we model the distributions of a biomarker for individuals in the healthy and diseased groups via a semiparametric density ratio model. Based on this model, we propose using the maximum empirical likelihood method to estimate the Youden index and the optimal cut‐off point. We further establish the asymptotic normality of the proposed estimators and construct valid confidence intervals for the Youden index and the corresponding optimal cut‐off point. The proposed method automatically covers both cases when there is no lower limit of detection (LLOD) and when there is a fixed and finite LLOD for the biomarker. Extensive simulation studies and a real data example are used to illustrate the effectiveness of the proposed method and its advantages over the existing methods.
In this article, we first provide an overview of two major developments on complex survey data analysis: the empirical likelihood methods and statistical inference with nonprobability survey samples. We highlight the important research contributions to the field of survey sampling in general and the two topics in particular by Canadian survey statisticians. We then propose new inferential procedures for analyzing nonprobability survey samples through the pseudo empirical likelihood approach. The proposed methods lead to point estimators asymptotically equivalent to those discussed in the recent literature but with more desirable features on confidence intervals such as range-respecting and data-driven orientation. Results from a simulation study demonstrate the superiority of the proposed methods in dealing with binary response variables.Résumé: Les auteurs de ce travail présentent, dans un premier temps, un aperçu de deux développements majeurs en analyse de données d'enquête complexes, à savoir les méthodes de vraisemblance empirique et l'inférence statistique en sondages non probabilistes. Ensuite, ils mettent en évidence les importantes contributions de statisticiens canadiens à la recherche en enquêtes par sondage en général et à ces deux sujets en particulier. En faisant appel à l'approche de pseudo-vraisemblance empirique, ils proposent de nouvelles procédures d'inférence pour l'analyse d'échantillons d'enquêtes non probabilistes. Ces méthodes conduisent à des estimateurs ponctuels asymptotiquement équivalents à des estimateurs discutés dans les écrits récents mais possédant des intervalles de confiance qui jouissent de meilleures propriétés, telles que préserver l'étendue et être axés sur les données. Les résultats d'une étude de simulation démontrent la supériorité des méthodes proposées dans le traitement de variables de réponse binaire.
Inverse probability weighting (IPW) methods are commonly used to analyze nonignorable missing data (NIMD) under the assumption of a logistic model for the missingness probability. However, solving IPW equations numerically may involve nonconvergence problems when the sample size is moderate and the missingness probability is high. Moreover, those equations often have multiple roots, and identifying the best root is challenging. Therefore, IPW methods may have low efficiency or even produce biased results. We identify the pitfall in these methods pathologically: they involve the estimation of a moment‐generating function (MGF), and such functions are notoriously unstable in general. As a remedy, we model the outcome distribution given the covariates of the completely observed individuals semiparametrically. After forming an induced logistic regression (LR) model for the missingness status of the outcome and covariate, we develop a maximum conditional likelihood method to estimate the underlying parameters. The proposed method circumvents the estimation of an MGF and hence overcomes the instability of IPW methods. Our theoretical and simulation results show that the proposed method outperforms existing competitors greatly. Two real data examples are analyzed to illustrate the advantages of our method. We conclude that if only a parametric LR is assumed but the outcome regression model is left arbitrary, then one has to be cautious in using any of the existing statistical methods in problems involving NIMD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.