Jing Ma scite author profile

Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess mis-clustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance.Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the highdimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm.

show abstract

A comparative study of topology-based pathway enrichment analysis methods

Shojaie

Michailidis

2019

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundPathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the biomolecules under study together with their membership in pathways of interest. The latest generation of pathway enrichment methods also leverages information on the topology of the underlying pathways, which as evidence from their evaluation reveals, lead to improved sensitivity and specificity. Nevertheless, a systematic empirical comparison of such methods is still lacking, making selection of the most suitable method for a specific experimental setting challenging. This comparative study of nine network-based methods for pathway enrichment analysis aims to provide a systematic evaluation of their performance based on three real data sets with different number of features (genes/metabolites) and number of samples.ResultsThe findings highlight both methodological and empirical differences across the nine methods. In particular, certain methods assess pathway enrichment due to differences both across expression levels and in the strength of the interconnectedness of the members of the pathway, while others only leverage differential expression levels. In the more challenging setting involving a metabolomics data set, the results show that methods that utilize both pieces of information (with NetGSA being a prototypical one) exhibit superior statistical power in detecting pathway enrichment.ConclusionThe analysis reveals that a number of methods perform equally well when testing large size pathways, which is the case with genomic data. On the other hand, NetGSA that takes into consideration both differential expression of the biomolecules in the pathway, as well as changes in the topology exhibits a superior performance when testing small size pathways, which is usually the case for metabolomics data.

show abstract

Network-based pathway enrichment analysis with incomplete network information

Shojaie

Michailidis

2016

View full text Add to dashboard Cite

show abstract

Differential Markov random field analysis with an application to detecting differential microbial community networks

Cai

et al. 2019

View full text Add to dashboard Cite

Microorganisms such as bacteria form complex ecological community networks that can be greatly influenced by diet and other environmental factors. Differential analysis of microbial community structures aims to elucidate such systematic changes during an adaptive response to changes in environment. In this paper, we propose a flexible Markov random field model for microbial network structure and introduce a hypothesis testing framework for detecting differences between networks, also known as differential network analysis. Our global test for differential networks is particularly powerful against sparse alternatives. In addition, we develop a multiple testing procedure with false discovery rate control to identify the structure of the differential network. The proposed method is applied to data from a gut microbiome study on UK twins to evaluate how age affects the microbial community network.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jing Ma

CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality

A comparative study of topology-based pathway enrichment analysis methods

Network-based pathway enrichment analysis with incomplete network information

Differential Markov random field analysis with an application to detecting differential microbial community networks

Contact Info

Product

Resources

About