2022
DOI: 10.1007/s40747-021-00625-1
|View full text |Cite
|
Sign up to set email alerts
|

Compilation and evaluation of the Spanish SatiCorpus 2021 for satire identification using linguistic features and transformers

Abstract: Satirical content on social media is hard to distinguish from real news, misinformation, hoaxes or propaganda when there are no clues as to which medium these news were originally written in. It is important, therefore, to provide Information Retrieval systems with mechanisms to identify which results are legitimate and which ones are misleading. Our contribution for satire identification is twofold. On the one hand, we release the Spanish SatiCorpus 2021, a balanced dataset that contains satirical and non-sat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…The LF included low-level linguistic categories concerning phonetics and syntax's, and high-level features related to semantics and pragmatics, including features proper from figurative language (del Pilar Salas-Zárate et al, 2020). Moreover, these kinds of features have proven to be effective for performing other automatic classification tasks such as irony and satire identification (García-Díaz and Valencia-García, 2022). As some of the dictionaries of UMUTextStats are not translated to English, we select a subset of language-independent linguistic features, based on linguistic metrics, Part-of-Speech features and the usage of social media jargon.…”
Section: Methodsmentioning
confidence: 99%
“…The LF included low-level linguistic categories concerning phonetics and syntax's, and high-level features related to semantics and pragmatics, including features proper from figurative language (del Pilar Salas-Zárate et al, 2020). Moreover, these kinds of features have proven to be effective for performing other automatic classification tasks such as irony and satire identification (García-Díaz and Valencia-García, 2022). As some of the dictionaries of UMUTextStats are not translated to English, we select a subset of language-independent linguistic features, based on linguistic metrics, Part-of-Speech features and the usage of social media jargon.…”
Section: Methodsmentioning
confidence: 99%
“…For compiling the dataset, we relied on the UMUCorpus-Classifier tool [47], developed by our research group. This tool crawls data from Twitter, a social network in which users can send and receive micro-blogging posts of less than 280 characters.…”
Section: Corpusmentioning
confidence: 99%
“…Following the spirit of our previous works [50], we evaluated several feature sets for conducting opinion mining in the financial domain. Specifically, linguistic features along with contextual and non-contextual pre-trained embeddings were evaluated.…”
Section: B Feature Extractionmentioning
confidence: 99%
“…Further works try to describe, in more detail, the characteristic of satiric content. In particular, the authors of [4] affirm that satire is characterized by: controversial or sensitive issues, aggressive language -negative emotions and tone -for entertainment purposes, a shorter form with respect to true news even if the words that can be found are more complex, a language not so clear since they are not written by professional journalists, based on imagination and figurative language such as metaphors, similes, personifications, idioms, etc. Moreover, in [5], three language dimensions are considered to better understand satires: the use of first-person singular, which is a proclamation of one's ownership of statements, the more negative words reflect negative emotions, and the use of exclusive words which emphasize a cognitive complexity.…”
Section: Introductionmentioning
confidence: 99%