We take interest in the early assessment of risk for depression in social media users. We focus on the eRisk 2018 dataset, which represents users as a sequence of their written online contributions. We implement four RNN-based systems to classify the users. We explore several aggregations methods to combine predictions on individual posts. Our best model reads through all writings of a user in parallel but uses an attention mechanism to prioritize the most important ones at each timestep.
Deep Averaging Networks (DANs) show strong performance in several key Natural Language Processing (NLP) tasks. However, their chief drawback is not accounting for the position of tokens when encoding sequences. We study how existing position encodings might be integrated into the DAN architecture. In addition, we propose a novel position encoding built specifically for DANs, which allows greater generalization capabilities to unseen lengths of sequences. This is demonstrated on decision tasks on binary sequences. Further, the resulting architecture is compared against unordered aggregation on sentiment analysis both with word-and character-level tokenization, to mixed results.
In applying Natural Language Processing to support mental health care, gathering annotated data is difficult. Recent work has pointed to lapses in approximative annotation schemes. While studying gaps in prediction accuracy can offer some information about these lapses, a more careful look is needed. Through the use of Influence Functions, quantification of the relevance of training examples according to their type of annotation is possible. Using a corpus aimed at suicidal risk assessment containing both crowdsourced and expert annotations, we examine the effects that these annotations have on model training at test time. Our results indicate that, while expert annotations are more helpful, the difference with respect to crowdsourced annotations is slight. Moreover, most globally helpful observations are crowdsourced, pointing to their potential.
This work proposes an approach to predict potential answers to the Beck Depression Inventory-Second Edition (BDI-II), a 21-item self-report inventory measuring the severity of depression in adolescents and adults. Predictions are based on similarity measures between the textual productions of social media users and completed BDI-IIs. Two methods of establishing similarity are compared. The first one is using unsupervised extraction of topics, and the second one is based on authorship attribution through the use of neural encoders. Both approaches achieve interesting results, indicating that the authorship attribution task can induce a similarity measure useful for depression symptom detection. The issues that arise in predicting several aspects of depression are further discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.