The automatic assessment of psychological traits from digital footprints allows researchers to study psychological traits at unprecedented scale and in settings of high ecological validity. In this research, we investigated whether spending records—a ubiquitous and universal form of digital footprint—can be used to infer psychological traits. We applied an ensemble machine-learning technique ( random-forest modeling) to a data set combining two million spending records from bank accounts with survey responses from the account holders ( N = 2,193). Our predictive accuracies were modest for the Big Five personality traits ( r = .15, corrected ρ = .21) but provided higher precision for specific traits, including materialism ( r = .33, corrected ρ = .42). We compared the predictive accuracy of these models with the predictive accuracy of alternative digital behaviors used in past research, including those observed on social media platforms, and we found that the predictive accuracies were relatively stable across socioeconomic groups and over time.
The authors present empirical evidence that borrowers, consciously or not, leave traces of their intentions, circumstances, and personality traits in the text they write when applying for a loan. This textual information has a substantial and significant ability to predict whether borrowers will pay back the loan above and beyond the financial and demographic variables commonly used in models predicting default. The authors use text-mining and machine learning tools to automatically process and analyze the raw text in over 120,000 loan requests from Prosper, an online crowdfunding platform. Including in the predictive model the textual information in the loan significantly helps predict loan default and can have substantial financial implications. The authors find that loan requests written by defaulting borrowers are more likely to include words related to their family, mentions of God, the borrower’s financial and general hardship, pleading lenders for help, and short-term-focused words. The authors further observe that defaulting loan requests are written in a manner consistent with the writing styles of extroverts and liars.
The authors propose a quantitative approach for describing entertainment products, in a way that allows for improving the predictive performance of consumer choice models for these products. Their approach is based on the media psychology literature, which suggests that people’s consumption of entertainment products is influenced by the psychological themes featured in these products. They classify psychological themes on the basis of the “character strengths” taxonomy from the positive psychology literature (Peterson and Seligman 2004). They develop a natural language processing tool, guided latent Dirichlet allocation (LDA), that automatically extracts a set of features of entertainment products from their descriptions. Guided LDA is flexible enough to allow features to be informed by psychological themes while allowing other relevant dimensions to emerge. The authors apply this tool to movies and show that guided LDA features help better predict movie-watching behavior at the individual level. They find this result with both award-winning movies and blockbuster movies. They illustrate the potential of the proposed approach in pure content-based predictive models of consumer behavior, as well as in hybrid predictive models that combine content-based models with collaborative filtering. They also show that guided LDA can improve the performance of models that predict aggregate outcomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.