Many data mining applications involve predictive modeling of very large, complex datasets. Such applications present a need for innovative algorithms and associated implementations that are not only effective in terms of prediction accuracy, but can also be efficiently run on distributed computational systems to yield results in reasonable time. This paper focuses on predictive modeling of multirelational data such as dyadic data with associated covariates or "side-information". We first give illustrative examples of applications that involve such data and then describe a general framework based on Simultaneous CO-clustering And Learning (SCOAL), which applies a divide-and-conquer approach to data analysis. We show that the main elements of the SCOAL algorithm can be effectively parallelized using the Map-Reduce framework. Experiments on Amazon's EC2 demonstrate that the proposed parallelizations result in considerable improvements in run time when using a cluster of machines.
Collaborative filtering approaches exploit information about historical affinities or ratings to predict unknown affinities between sets of "users" and "items" and make recommendations. However a model that also incorporates heterogeneous sources of information that may be available on the users and/or items can become a much more effective recommender, in terms of both increased relevance of the predictions as well as explainability of the results. In this paper, we propose a Bayesian approach that exploits not only such "side-information", but also a different kind of heterogeneity that captures the variations in the mapping from user/item attributes to the affinities of interest. Such predictive heterogeneity is likely to occur in large recommender systems that involve a diverse set of users, and can be mitigated by using multiple localized predictive models rather than a single global one that covers all user-item pairs. The scope or coverage of each local model is determined simultaneously with the model parameters. The proposed approach can incorporate different types of inputs to predict the preferences of diverse users and items. We compare it against well-known alternative approaches and analyze the results in terms of both accuracy and interpretability.
We explore the emerging phenomenon of blogging about personal goals, and demonstrate how natural language processing tools can be used to uncover psychologically meaningful constructs in blogs. We describe features of a blog community (2638 blogs) devoted to weight loss. We compare several approaches to text analysis in predicting weight loss from natural language use in a subset of the blogs (258 users; over 13,000 entries). First, we use a bag of words approach to distinguish the degree to which individual words can predict success and failure. Next, we compare the results to a deductive word count and categorization tool, Linguistic Inquiry and Word Count. We discuss the theoretical significance of the words and word categories that distinguish between bloggers who succeed and those who fail in their weight loss attempts, along with the implications of automated text analysis in summarizing psychological features of blogs
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.