DLATK: Differential Language Analysis ToolKit

Schwartz, H. Andrew; Giorgi, Salvatore; Sap, Maarten; Crutchley, Patrick; Ungar, Lyle H.; Eichstaedt, Johannes C.

doi:10.18653/v1/d17-2010

Cited by 122 publications

(94 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following the steps developed in previous work (Preotiuc‐Pietro et al, 2016), for the 2895 human‐annotated Facebook statuses in the calibration sample, we used the Differential Language Analysis ToolKit (DLATK; Schwartz et al, 2017; see dlatk.wwbp.org) to extract three sets of linguistic features: (i) the relative frequency of occurrences of words and phrases; (ii) 2000 latent Dirichlet allocation topics derived in previous work from 18 million Facebook status updates using the MALLET package (Schwartz et al, 2013 ); and (iii) LIWC dictionaries (LIWC 2007; Pennebaker, Chung, Ireland, Gonzales, & Booth, 2007). DLATK implements emoticon‐aware tokenization (splitting of statuses into ‘words’).…”

Section: Methodsmentioning

confidence: 99%

Tracking Fluctuations in Psychological States using Social Media Language: A Case Study of Weekly Emotion

Eichstaedt

Weidman

2020

Eur J Pers

View full text Add to dashboard Cite

Personality psychologists are increasingly documenting dynamic, within‐person processes. Big data methodologies can augment this endeavour by allowing for the collection of naturalistic and personality‐relevant digital traces from online environments. Whereas big data methods have primarily been used to catalogue static personality dimensions, here we present a case study in how they can be used to track dynamic fluctuations in psychological states. We apply a text‐based, machine learning prediction model to Facebook status updates to compute weekly trajectories of emotional valence and arousal. We train this model on 2895 human‐annotated Facebook statuses and apply the resulting model to 303 575 Facebook statuses posted by 640 US Facebook users who had previously self‐reported their Big Five traits, yielding an average of 28 weekly estimates per user. We examine the correlations between model‐predicted emotion and self‐reported personality, providing a test of the robustness of these links when using weekly aggregated data, rather than momentary data as in prior work. We further present dynamic visualizations of weekly valence and arousal for every user, while making the final data set of 17 937 weeks openly available. We discuss the strengths and drawbacks of this method in the context of personality psychology's evolution into a dynamic science. © 2020 European Association of Personality Psychology

show abstract

Section: Methodsmentioning

confidence: 99%

Tracking Fluctuations in Psychological States using Social Media Language: A Case Study of Weekly Emotion

Eichstaedt

Weidman

2020

Eur J Pers

View full text Add to dashboard Cite

show abstract

“…In addition, we use the difference between standardized metric scores to find the features that distinguish high quality comments in one metric versus another. All methods were implemented within the package, dlatk (Schwartz et al, 2017). Figure 2 shows the n-grams most highly correlated with each of our quality metrics.…”

Section: Methodsmentioning

confidence: 99%

Assessing Objective Recommendation Quality through Political Forecasting

Schwartz

Rouhizadeh

Bishop

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Recommendations are often rated for their subjective quality, but few researchers have studied quality in terms of objective utility. We explore quality assessment with respect to both subjective (i.e. users' ratings) and objective (i.e., did it influence? did it improve decisions?) metrics in a massive online geopolitical forecasting system, ultimately comparing linguistic characteristics of each quality metric. Using a variety of features, we predict all types of quality with better accuracy than the simple yet strong baseline of recommendation length. For example, more complex sentence constructions, as evidenced by subordinate conjunctions, are characteristic of recommendations leading to objective improvements in forecasting. Our analyses also reveal rater biases; for example, forecasters are subjectively biased in favor of recommendations mentioning business deals and material things, even though such recommendations do not indeed prove any more useful objectively.

show abstract

“…An open-vocabulary statistical learning and modeling approach was used to find topics that the QLC group talk about more than the control group. This was conducted using an open source language analysis toolkit (DLATK) (Schwartz et al, 2017). From each post, words were identified (using an emoticon-aware tokenizer which also looked for tokens such as ':)' , ':-D' etc.)…”

Section: Open-vocabulary Approachmentioning

confidence: 99%

Examining the Phenomenon of Quarter-Life Crisis Through Artificial Intelligence and the Language of Twitter

et al. 2020

View full text Add to dashboard Cite

Quarter-life crisis (QLC) is a popular term for developmental crisis episodes that occur during early adulthood (18-30). Our aim was to explore what linguistic themes are associated with this phenomenon as discussed on social media. We analyzed 1.5 million tweets written by over 1,400 users from the United Kingdom and United States that referred to QLC, comparing their posts to those used by a control set of users who were matched by age, gender and period of activity. Logistic regression was used to uncover significant associations between words, topics, and sentiments of users and QLC, controlling for demographics. Users who refer to a QLC were found to post more about feeling mixed emotions, feeling stuck, wanting change, career, illness, school, and family. Their language tended to be focused on the future. Of 20 terms selected according to early adult crisis theory, 16 were mentioned by the QLC group more than the control group. The insights from this study could be used by clinicians and coaches to better understand the developmental challenges faced by young adults and how these are portrayed naturalistically in the language of social media.

show abstract

DLATK: Differential Language Analysis ToolKit

Cited by 122 publications

References 29 publications

Tracking Fluctuations in Psychological States using Social Media Language: A Case Study of Weekly Emotion

Tracking Fluctuations in Psychological States using Social Media Language: A Case Study of Weekly Emotion

Assessing Objective Recommendation Quality through Political Forecasting

Examining the Phenomenon of Quarter-Life Crisis Through Artificial Intelligence and the Language of Twitter

Contact Info

Product

Resources

About