Metabolomic data processing pipelines have been improving in recent years, allowing for greater feature extraction and identification. Lately, machine learning and robust statistical techniques to control false discoveries are being incorporated into metabolomic data analysis. In this paper, we introduce one such recently developed technique called aggregate knockoff filtering to untargeted metabolomic analysis. When applied to a publicly available dataset, aggregate knockoff filtering combined with typical p-value filtering improves the number of significantly changing metabolites by 25% when compared to conventional untargeted metabolomic data processing. By using this method, features that would normally not be extracted under standard processing would be brought to researchers’ attention for further analysis.
We consider the problem of generating valid knockoffs for knockoff filtering which is a statistical method that provides provable false discovery rate guarantees for any model selection procedure. To this end, we are motivated by recent advances in multivariate distribution-free goodness-of-fit tests namely, the rank energy (RE), that is derived using theoretical results characterizing the optimal maps in the Monge's Optimal Transport (OT) problem. However, direct use of use RE for learning generative models is not feasible because of its high computational and sample complexity, saturation under large support discrepancy between distributions, and non-differentiability in generative parameters. To alleviate these, we begin by proposing a variant of the RE, dubbed as soft rank energy (sRE), and its kernel variant called as soft rank maximum mean discrepancy (sRMMD) using entropic regularization of Monge's OT problem. We then use sRMMD to generate deep knockoffs and show via extensive evaluation that it is a novel and effective method to produce valid knockoffs, achieving comparable, or in some cases improved tradeoffs between detection power Vs false discoveries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.