Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction

Rashid, Naim U.; Li, Quefeng; Yeh, Jen Jen; Ibrahim, Joseph G.

doi:10.1080/01621459.2019.1671197

Cited by 15 publications

(25 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here we propose hierarchical resampling techniques coupled with covariate distributionbased weighting schemes towards similar ends. We leverage ideas from the rich literature in transfer learning and the rapidly growing "multi-study" statistics perspective that proposes methods to combine studies in supervised (Guan, Parmigiani and Patil, 2020;Ramchandran, Patil and Parmigiani, 2020;Ren et al, 2021), unsupervised (De Vito et al, 2019;Roy et al, 2019) and inference settings (Guo et al, 2021;Rashid et al, 2020).…”

Section: Multi-study Learningmentioning

confidence: 99%

Hierachical Resampling for Bagging in Multi-Study Prediction with Applications to Human Neurochemical Sensing

Loewinger

Patil

Kishida

et al. 2019

Preprint

View full text Add to dashboard Cite

Prediction settings with multiple studies have become increasingly common. Ensembling models trained on individual studies has been shown to improve replicability in new studies. Motivated by a groundbreaking new technology in human neuroscience, we introduce two generalizations of multi-study ensemble predictions. First, while existing methods weight ensemble elements by cross-study prediction performance, we extend weighting schemes to also incorporate covariate similarity between training data and target validation studies. Second, we introduce a hierarchical resampling scheme to generate pseudo-study replicates ("study straps") and ensemble classifiers trained on these rather than the original studies themselves. We demonstrate analytically that existing methods are special cases. Through a tuning parameter, our approach forms a continuum between merging all training data and training with existing multistudy ensembles. Leveraging this continuum helps accommodate different levels of between-study heterogeneity.Our methods are motivated by the application of Voltammetry in humans. This technique records electrical brain measurements and converts signals into neurotransmitter concentration estimates using a prediction model. Using this model in practice presents a crossstudy challenge, for which we show marked improvements after application of our methods. We verify our methods in simulations and provide the studyStrap R package. * NSF-DMS1810829 † T32 AI 007358 ‡ NIH, R01 DA048096; NIH, R01 MH121099; NIH, R01 NS092701; NIH, 5KL2TR00142 § WFSOM, Phys/Pharm Neurosurgery

show abstract

Section: Multi-study Learningmentioning

confidence: 99%

Hierachical Resampling for Bagging in Multi-Study Prediction with Applications to Human Neurochemical Sensing

Loewinger

Patil

Kishida

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Our manuscript is organized as follows. Section 2.2 reviews the pGLMMs modeling framework, first described in Rashid et al (2020). Section 2.3 describes the MCECM algorithm used by glmmPen to fit pGLMM models.…”

Section: Introductionmentioning

confidence: 99%

Linear Hypothesis Testing in Ultra High Dimensional Generalized Linear Mixed Models

Zhang¹,

Li²

2022

SSRN Journal

View full text Add to dashboard Cite

“…They treated the coefficients of each covariate from all datasets as groups, and performed the simultaneously variable selection both on the group and within the group. For other existing variable selection methods including, for example, group Bridge, composite MCP, and group exponential lasso that can be extended to meta-analyzing multiple studies, one may refer to Zhao et al ( 2015 ), Kim et al ( 2017 ), and Rashid et al ( 2020 ).…”

Section: Introductionmentioning

confidence: 99%

Meta-Analyzing Multiple Omics Data With Robust Variable Selection

Zhou

Tong

2021

Front. Genet.

View full text Add to dashboard Cite

High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data, it is also desirable to incorporate variable selection into meta-analysis. Existing meta-analyzing variable selection methods are often sensitive to the presence of outliers, and may lead to missed detections of relevant covariates, especially for lasso-type penalties. In this paper, we develop a robust variable selection algorithm for meta-analyzing high-dimensional datasets based on logistic regression. We first search an outlier-free subset from each dataset by borrowing information across the datasets with repeatedly use of the least trimmed squared estimates for the logistic model and together with a hierarchical bi-level variable selection technique. We then refine a reweighting step to further improve the efficiency after obtaining a reliable non-outlier subset. Simulation studies and real data analysis show that our new method can provide more reliable results than the existing meta-analysis methods in the presence of outliers.

show abstract

Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction

Cited by 15 publications

References 36 publications

Hierachical Resampling for Bagging in Multi-Study Prediction with Applications to Human Neurochemical Sensing

Hierachical Resampling for Bagging in Multi-Study Prediction with Applications to Human Neurochemical Sensing

Linear Hypothesis Testing in Ultra High Dimensional Generalized Linear Mixed Models

Meta-Analyzing Multiple Omics Data With Robust Variable Selection

Contact Info

Product

Resources

About