Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared with traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored Markov chain Monte Carlo schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp. This article is part of the theme issue ‘Bayesian inference: challenges, perspectives, and prospects’.
Reliable estimates of volatility and correlation are fundamental in economics and finance for understanding the impact of macroeconomics events on the market and guiding future investments and policies. Dependence across financial returns is likely to be subject to sudden structural changes, especially in correspondence with major global events, such as the COVID-19 pandemic. In this work, we are interested in capturing abrupt changes over time in the dependence across US industry stock portfolios, over a time horizon that covers the COVID-19 pandemic. The selected stocks give a comprehensive picture of the US stock market. To this end, we develop a Bayesian multivariate stochastic volatility model based on a time-varying sequence of graphs capturing the evolution of the dependence structure. The model builds on the Gaussian graphical models and the random change points literature. In particular, we treat the number, the position of change points, and the graphs as object of posterior inference, allowing for sparsity in graph recovery and change point detection. The high dimension of the parameter space poses complex computational challenges. However, the model admits a hidden Markov model formulation. This leads to the development of an efficient computational strategy, based on a combination of sequential Monte-Carlo and Markov chain Monte-Carlo techniques. Model and computational development are widely applicable, beyond the scope of the application of interest in this work.
Hypertensive disorders of pregnancy occur in about 10% of pregnant women around the world. Though there is evidence that hypertension impacts maternal cardiac functions, the relation between hypertension and cardiac dysfunctions is only partially understood. The study of this relationship can be framed as a joint inferential problem on multiple populations, each corresponding to a different hypertensive disorder diagnosis, that combines multivariate information provided by a collection of cardiac function indexes. A Bayesian nonparametric approach seems particularly suited for this setup and we demonstrate it on a dataset consisting of transthoracic echocardiography results of a cohort of Indian pregnant women. We are able to perform model selection, provide density estimates of cardiac function indexes and a latent clustering of patients: these readily interpretable inferential outputs allow to single out modified cardiac functions in hypertensive patients compared to healthy subjects and progressively increased alterations with the severity of the disorder. The analysis is based on a Bayesian nonparametric model that relies on a novel hierarchical structure, called symmetric hierarchical Dirichlet process. This is suitably designed so that the mean parameters are identified and used for model selection across populations, a penalization for multiplicity is enforced, and the presence of unobserved relevant factors is investigated through a latent clustering of subjects. Posterior inference relies on a suitable Markov Chain Monte Carlo algorithm and the model behaviour is also showcased on simulated data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.