A person's language use reveals much about the person's social identity, which is based on the social categories a person belongs to including age and gender. We discuss the development of TweetGenie, a computer program that predicts the age of Twitter users based on their language use. We explore age prediction in three different ways: classifying users into age categories, by life stages, and predicting their exact age. An automatic system achieves better performance than humans on these tasks. Both humans and the automatic systems tend to underpredict the age of older people. We find that most linguistic changes occur when people are young, and that after around 30 years the studied variables show little change, making it difficult to predict the ages of older Twitter users.
In this paper we focus on the connection between age and language use, exploring age prediction of Twitter users based on their tweets. We discuss the construction of a fine-grained annotation effort to assign ages and life stages to Twitter users. Using this dataset, we explore age prediction in three different ways: classifying users into age categories, by life stages, and predicting their exact age. We find that an automatic system achieves better performance than humans on these tasks and that both humans and the automatic systems have difficulties predicting the age of older people. Moreover, we present a detailed analysis of variables that change with age. We find strong patterns of change, and that most changes occur at young ages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.