Ultrahigh and high dimensional data are common in regression analysis for various fields, such as omics data, finance, and biological engineering. In addition to the problem of dimension, the data might also be contaminated. There are two main types of contamination: outliers and model misspecification. We develop an unique method that takes into account the ultrahigh or high dimensional issues and both types of contamination. In this article, we propose a framework for feature screening and selection based on the minimum Lq‐likelihood estimation (MLqE), which accounts for the model misspecification contamination issue and has also been shown to be robust to outliers. In numerical analysis, we explore the robustness of this framework under different outliers and model misspecification scenarios. To examine the performance of this framework, we conduct real data analysis using the skin cutaneous melanoma data. When comparing with traditional screening and feature selection methods, the proposed method shows superiority in both variable identification effectiveness and parameter estimation accuracy.
Summary Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.