Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 2017
DOI: 10.18653/v1/d17-2010
|View full text |Cite
|
Sign up to set email alerts
|

DLATK: Differential Language Analysis ToolKit

Abstract: We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
94
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
1
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 122 publications
(94 citation statements)
references
References 29 publications
0
94
0
Order By: Relevance
“…Following the steps developed in previous work (Preotiuc‐Pietro et al, 2016), for the 2895 human‐annotated Facebook statuses in the calibration sample, we used the Differential Language Analysis ToolKit (DLATK; Schwartz et al, 2017; see dlatk.wwbp.org) to extract three sets of linguistic features: (i) the relative frequency of occurrences of words and phrases; (ii) 2000 latent Dirichlet allocation topics derived in previous work from 18 million Facebook status updates using the MALLET package (Schwartz et al, 2013 ); and (iii) LIWC dictionaries (LIWC 2007; Pennebaker, Chung, Ireland, Gonzales, & Booth, 2007). DLATK implements emoticon‐aware tokenization (splitting of statuses into ‘words’).…”
Section: Methodsmentioning
confidence: 99%
“…Following the steps developed in previous work (Preotiuc‐Pietro et al, 2016), for the 2895 human‐annotated Facebook statuses in the calibration sample, we used the Differential Language Analysis ToolKit (DLATK; Schwartz et al, 2017; see dlatk.wwbp.org) to extract three sets of linguistic features: (i) the relative frequency of occurrences of words and phrases; (ii) 2000 latent Dirichlet allocation topics derived in previous work from 18 million Facebook status updates using the MALLET package (Schwartz et al, 2013 ); and (iii) LIWC dictionaries (LIWC 2007; Pennebaker, Chung, Ireland, Gonzales, & Booth, 2007). DLATK implements emoticon‐aware tokenization (splitting of statuses into ‘words’).…”
Section: Methodsmentioning
confidence: 99%
“…In addition, we use the difference between standardized metric scores to find the features that distinguish high quality comments in one metric versus another. All methods were implemented within the package, dlatk (Schwartz et al, 2017). Figure 2 shows the n-grams most highly correlated with each of our quality metrics.…”
Section: Methodsmentioning
confidence: 99%
“…An open-vocabulary statistical learning and modeling approach was used to find topics that the QLC group talk about more than the control group. This was conducted using an open source language analysis toolkit (DLATK) (Schwartz et al, 2017). From each post, words were identified (using an emoticon-aware tokenizer which also looked for tokens such as ':)' , ':-D' etc.)…”
Section: Open-vocabulary Approachmentioning
confidence: 99%