In statistics, mixture models are used to characterize datasets with multimodal distributions. A class of mixture models called Gaussian Mixture Models (GMMs) has gained immense popularity among practitioners because of its sound statistical foundation and an efficient learning algorithm, which scales very well with both the dimension and the size of a dataset. However, the underlying assumption, that every mixing component is normally distributed, can often be too rigid for several real life datasets. In this paper, we introduce a new class of parametric mixture models that are based on Copula functions. The goal is to relax the assumption about the normality of mixing components. We formulate a class of functions called Gaussian Mixture Copula functions for the characterization of multi-modal distributions. The parameters of the proposed Gaussian Mixture Copula Model (GMCM) can be obtained in a Maximum-Likelihood setting. For this purpose, an Expectation-Maximization (EM) and a Gradientbased optimization algorithm are proposed. Owing to the nonconvex log-likelihood function, only locally optimal solutions can be obtained. We also provide experimental evidence of the benefits of the GMCM over GMM using both synthetic and real-life datasets.
This paper outlines a retail sales prediction and product recommendation system that was implemented for a chain of retail stores. The relative importance of consumer demographic characteristics for accurately modeling the sales of each customer type are derived and implemented in the model. Data consisted of daily sales information for 600 products at the store level, broken out over a set of non-overlapping customer types. A recommender system was built based on a fast online thin Singular Value Decomposition. It is shown that modeling data at a finer level of detail by clustering across customer types and demographics yields improved performance compared to a single aggregate model built for the entire dataset. Details of the system implementation are described and practical issues that arise in such real-world applications are discussed. Preliminary results from test stores over a one-year period indicate that the system resulted in significantly increased sales and improved efficiencies. A brief overview of how the primary methods discussed here were extended to a much larger data set is given to confirm and illustrate the scalability of this approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.