Johnny Hong scite author profile

Johnny Hong

4Publications

42Citation Statements Received

36Citation Statements Given

How they've been cited

How they cite others

102

Affiliations

University of California, Berkeley

Publications

Order By: Most citations

To rarefy or not to rarefy: robustness and efficiency trade-offs of rarefying microbiome data

Hong

Karaöz

Valpine

et al. 2022

View full text Add to dashboard Cite

Motivation Microbiome datasets provide rich information about microbial communities. However, vast library size variations across samples present great challenges for proper statistical comparisons. To deal with these challenges, rarefaction is often used in practice as a normalization technique, although there has been debate whether rarefaction should ever be used. Conventional wisdom and previous work suggested that rarefaction should never be used in practice, arguing that rarefying microbiome data is statistically inadmissible. These discussions, however, have been confined to particular parametric models and simulation studies. Results We develop a semiparametric graphical model framework for grouped microbiome data and analyze in the context of differential abundance testing the statistical trade-offs of the rarefaction procedure, accounting for latent variations and measurement errors. Under the framework, it can be shown rarefaction guarantees that subsequent permutation tests properly control the Type I error. In addition, the loss in sensitivity from rarefaction is solely due to increased measurement error; if the underlying variation in microbial composition is large among samples, rarefaction might not hurt subsequent statistical inference much. We develop the rarefaction efficiency index (REI) as an indicator for efficiency loss and illustrate it with a data set on the effect of storage conditions for microbiome data. Simulation studies based on real data demonstrate that the impact of rarefaction on sensitivity is negligible when overdispersion is prominent, while low REI corresponds to scenarios in which rarefying might substantially lower the statistical power. Whether to rarefy or not ultimately depends on assumptions of the data generating process and characteristics of the data. Availability Source codes are publicly available at https://github.com/jcyhong/rarefaction. Supplementary information Supplementary materials are available at Bioinformatics online.

show abstract

Relaxed Wasserstein with Applications to GANs

Guo

Hong

Lin

et al. 2021

View full text Add to dashboard Cite

Wasserstein Generative Adversarial Networks (WGANs) provide a versatile class of models, which have attracted great attention in various applications. However, this framework has two main drawbacks: (i) Wasserstein-1 (or Earth-Mover) distance is restrictive such that WGANs cannot always fit data geometry well; (ii) It is difficult to achieve fast training of WGANs. In this paper, we propose a new class of Relaxed Wasserstein (RW) distances by generalizing Wasserstein-1 distance with Bregman cost functions. We show that RW distances achieve nice statistical properties while not sacrificing the computational tractability. Combined with the GANs framework, we develop Relaxed WGANs (RWGANs) which are not only statistically flexible but can be approximated efficiently using heuristic approaches. Experiments on real images demonstrate that the RWGAN with Kullback-Leibler (KL) cost function outperforms other competing approaches, e.g., WGANs, even with gradient penalty.

show abstract

Relaxed Wasserstein with Applications to GANs

Guo

Hong

Lin

et al. 2017

Preprint

View full text Add to dashboard Cite

Ambiguity set and learning via Bregman and Wasserstein

Guo¹,

Hong²,

Yang³

2017

Preprint

View full text Add to dashboard Cite

Construction of ambiguity set in robust optimization relies on the choice of divergences between probability distributions. In distribution learning, choosing appropriate probability distributions based on observed data is critical for approximating the true distribution. To improve the performance of machine learning models, there has recently been interest in designing objective functions based on Lp-Wasserstein distance rather than the classical Kullback-Leibler (KL) divergence. In this paper, we derive concentration and asymptotic results using Bregman divergence. We propose a novel asymmetric statistical divergence called Wasserstein-Bregman divergence as a generalization of L2-Wasserstein distance. We discuss how these results can be applied to the construction of ambiguity set in robust optimization.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Johnny Hong

To rarefy or not to rarefy: robustness and efficiency trade-offs of rarefying microbiome data

Relaxed Wasserstein with Applications to GANs

Relaxed Wasserstein with Applications to GANs

Ambiguity set and learning via Bregman and Wasserstein

Contact Info

Product

Resources

About