2020
DOI: 10.1080/19312458.2020.1832976
|View full text |Cite
|
Sign up to set email alerts
|

Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages

Abstract: Many social scientists recognize that quantitative text analysis is a useful research methodology, but its application is still concentrated in documents written in European languages, especially English, and few sub-fields of political science, such as comparative politics and legislative studies. This seems to be due to the absence of flexible and cost-efficient methods that can be used to analyze documents in different domains and languages. Aiming to solve this problem, this paper proposes a semisupervised… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
61
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(64 citation statements)
references
References 46 publications
0
61
0
3
Order By: Relevance
“…These steps ensure that the remaining word-tokens represent the meetings’ most significant and meaningful words. Finally, this corpus is used to create a feature co-occurrence matrix (FCM) that estimates the semantic proximity of each token, which is why the relatively short meeting reports are a suitable data source (Watanabe 2020 ). In general, an FCM represents a network based on co-occurrences of specific features (in this study tokens) in a defined context (here: all meeting reports).…”
Section: Methodsmentioning
confidence: 99%
“…These steps ensure that the remaining word-tokens represent the meetings’ most significant and meaningful words. Finally, this corpus is used to create a feature co-occurrence matrix (FCM) that estimates the semantic proximity of each token, which is why the relatively short meeting reports are a suitable data source (Watanabe 2020 ). In general, an FCM represents a network based on co-occurrences of specific features (in this study tokens) in a defined context (here: all meeting reports).…”
Section: Methodsmentioning
confidence: 99%
“…To construct such a measure, I resort to what the recent text-as-data literature has denoted as latent semantic scaling (Rheault et al, 2016;Wantanbe, 2020). It starts from just a few selected words that are intuitively associated with the polarity of interest.…”
Section: Emergency Emphasis In the Public Communication Of European Executivesmentioning
confidence: 99%
“…The level of the incivility of a word was obtained by summing the cosine similarity of the vector of the word and the seven uncivil seed words, summing the cosine similarity of the vector of the word and the seven civil seed words multiplied by -1, and dividing the sum by 14. The level of incivility of a tweet was obtained by calculating the level for all the words in the tweet using the aforementioned method and dividing the sum by the number of words (for details, see Watanabe (2020)).…”
Section: Dependent Variablementioning
confidence: 99%