2017
DOI: 10.1080/19312458.2017.1387238
|View full text |Cite
|
Sign up to set email alerts
|

Text Analysis in R

Abstract: Computational text analysis has become an exciting research field with many applications in communication research. It can be a difficult method to apply, however, because it requires knowledge of various techniques, and the software required to perform most of these techniques is not readily available in common statistical software packages. In this teacher's corner, we address these barriers by providing an overview of general steps and operations in a computational text analysis project, and demonstrate how… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
140
0
8

Year Published

2018
2018
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 237 publications
(148 citation statements)
references
References 33 publications
0
140
0
8
Order By: Relevance
“…To address the first aim of this study, we created an e-cigarette topics dictionary using the software package Quanteda (Welbers, Van Atteveldt, & Benoit, 2017). This dictionary categorized Reddit submissions based on seven different e-cigarette-related topics: (1) "advice," which refers to submissions about seeking information; (2) "build your own," which refers to submissions about e-cigarette parts or kits that can be used to build vaping devices; (3) "buying/selling," which refers to submissions about e-cigarettes as merchandise; (4) "drugs," which refers to submissions about the use of vaping devices for illicit purposes, such as vaping marijuana; (5) "e-juice," which refers to submissions about e-liquid or e-liquid flavors; (6) "health/safety," which refers to submissions about the various health effects associated with e-cigarettes; and (g) "tobacco," which refers to submissions containing tobacco-related content, including mentions of combustible cigarettes or nicotine.…”
Section: Identifying Submission Topicsmentioning
confidence: 99%
“…To address the first aim of this study, we created an e-cigarette topics dictionary using the software package Quanteda (Welbers, Van Atteveldt, & Benoit, 2017). This dictionary categorized Reddit submissions based on seven different e-cigarette-related topics: (1) "advice," which refers to submissions about seeking information; (2) "build your own," which refers to submissions about e-cigarette parts or kits that can be used to build vaping devices; (3) "buying/selling," which refers to submissions about e-cigarettes as merchandise; (4) "drugs," which refers to submissions about the use of vaping devices for illicit purposes, such as vaping marijuana; (5) "e-juice," which refers to submissions about e-liquid or e-liquid flavors; (6) "health/safety," which refers to submissions about the various health effects associated with e-cigarettes; and (g) "tobacco," which refers to submissions containing tobacco-related content, including mentions of combustible cigarettes or nicotine.…”
Section: Identifying Submission Topicsmentioning
confidence: 99%
“…Schubert et al [31] presented a novel methodology to model word significance and word affinity in a text and build the word cloud based on the derived dependency. Welbers et al [32] provided a summary of common steps and actions in a computational text analysis project and demonstrated how every step can be completed using the R statistical software.…”
Section: F Text Miningmentioning
confidence: 99%
“…This evaluation generates the weight of a term directly symmetric to its frequency in each document and inversely symmetric to its frequency to the set of documents. In a term-document matrix rows correspond to terms and columns correspond to documents in corpus [32]. Weighting of TDM is term frequency-inverse document frequency (TF-IDF) [14].…”
Section: ) Constructing Tdmmentioning
confidence: 99%
“…While data mining (DM) assumes that data is stored in a structured format, TM data needs no structured format. Thus, TM data requires the application of preprocessing operations to identify and extract features representative of natural language documents (Welbers, Van Atteveldt, & Benoit, 2017). Due to the importance of natural language processing in TM, the latter draws on the advances of other computer science disciplines, like data science, to achieve its objectives.…”
Section: Automatic Literature Reviewmentioning
confidence: 99%
“…For this reason, and according to the scope of this work, it was decided to create a single dataset based on the fields described in Figure 4 by fusion of the two results. This involved a normalisation process: the conversion of all text into lowercase, thus transforming all words into a uniform form (Welbers et al, 2017). All text preprocessing was performed using the "NLP" (Hornik, 2017) and "tm" (Feinerer & Hornik, 2017) R packages.…”
Section: Data Extraction and Pre-processingmentioning
confidence: 99%