Proceedings - Natural Language Processing in a Deep Learning World 2019
DOI: 10.26615/978-954-452-056-4_006
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Sentence-Level Bias Detection inWikipedia

Abstract: We propose a multilingual method for the extraction of biased sentences from Wikipedia, and use it to create corpora in Bulgarian, French and English. Sifting through the revision history of the articles that at some point had been considered biased and later corrected, we retrieve the last tagged and the first untagged revisions as the before/after snapshots of what was deemed a violation of Wikipedia's neutral point of view policy. We extract the sentences that were removed or rewritten in that edit. The app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 43 publications
1
5
0
Order By: Relevance
“…Our model, however, sharpens the notions of framing and coverage biases by linking them to strategies at the lexical and discursive level that can be opportunistic and evolve over time. We also see the confirmation of different levels of granularity in our corpus: category-level, message-level and media source level [1,7].…”
Section: Discussionsupporting
confidence: 62%
See 3 more Smart Citations
“…Our model, however, sharpens the notions of framing and coverage biases by linking them to strategies at the lexical and discursive level that can be opportunistic and evolve over time. We also see the confirmation of different levels of granularity in our corpus: category-level, message-level and media source level [1,7].…”
Section: Discussionsupporting
confidence: 62%
“…4.3, questions lexicon and embedding-based approaches. In addition, seemingly neutral labels (such as people vs. ordinary people vs. people in plainclothes) are usually not considered as potentially biased, especially in lexicon-based approaches, such as [1].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Prior research on identifying problematic edits has considered weasel words, which Wikipedia defines as "words or phrases aimed at creating an impression that something specific and meaningful has been said, when in fact only a vague or ambiguous claim has been communicated" (Wikipedia, 2021c), and hedges, which Farkas et al (2010) define as phrases "indicating that authors do not or cannot back up their opinions/statements with facts. The best models for detecting weasel words in both the CoNLL-2010 ACL shared task on weasel words (Farkas et al, 2010), and a multilingual weasel word corpus (Aleksandrova et al, 2019) use bag-ofwords classification approaches (Georgescul, 2010;Aleksandrova et al, 2019).…”
Section: Background and Motivationmentioning
confidence: 99%