Because of its richness and availability, micro-blogging has become an ideal platform for conducting psychological research. In this paper, we proposed to predict active users' personality traits through micro-blogging behaviors. 547 Chinese active users of micro-blogging participated in this study. Their personality traits were measured by the Big Five Inventory, and digital records of micro-blogging behaviors were collected via web crawlers. After extracting 845 micro-blogging behavioral features, we first trained classification models utilizing Support Vector Machine (SVM), differentiating participants with high and low scores on each dimension of the Big Five Inventory. The classification accuracy ranged from 84% to 92%. We also built regression models utilizing PaceRegression methods, predicting participants' scores on each dimension of the Big Five Inventory. The Pearson correlation coefficients between predicted scores and actual scores ranged from 0.48 to 0.54. Results indicated that active users' personality traits could be predicted by micro-blogging behaviors.
Personality research on social media is a hot topic recently due to the rapid development of social media as well as the central importance of personality study in psychology, but most studies are conducted on inadequate label samples. Our research aims to explore the usage of unlabeled samples to improve the prediction accuracy. By conducting n user study with 1792 users, we adopt local linear semi-supervised regression algorithm to predict the personality traits of Microblog users. Given a set of Microblog users' public information (e.g., number of followers) and a few labeled users, the task is to predict personality of other unlabeled users. The local linear semi-supervised regression algorithm has been employed to establish prediction model in this paper, and the experimental results demonstrate the usage of unlabeled data can improve the accuracy of prediction.
It is important to acquire web users’ psychological characteristics. Recent studies have built computational models for predicting psychological characteristics by supervised learning. However, the generalization of built models might be limited due to the differences in distribution between the training and test dataset. To address this problem, we propose some local regression transfer learning methods. Specifically, k-nearest-neighbour and clustering reweighting methods are developed to estimate the importance of each training instance, and a weighted risk regression model is built for prediction. Adaptive parameter-setting method is also proposed to deal with the situation that the test dataset has no labels. We performed experiments on prediction of users’ personality and depression based on users of different genders or different districts, and the results demonstrated that the methods could improve the generalization capability of learning models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.