ICICCT 2019 – System Reliability, Quality Control, Safety, Maintenance and Management 2019
DOI: 10.1007/978-981-13-8461-5_1
|View full text |Cite
|
Sign up to set email alerts
|

Good Morning Turning to Spam Morning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…3 We also remove messages containing just URLs (no accompanying text), boilerplate content such as 'hi' or 'good morning', and, messages consisting solely of emojis, which constitute around 25% of our data. We filter such content to avoid characterising low entropy posts -such messages could fall under both on-topic and off-topic as junk [18]. As later discussed, the percentage of boilerplate messages among junk posters is 7% (12% for legitimate users).…”
Section: Message Filteringmentioning
confidence: 99%
“…3 We also remove messages containing just URLs (no accompanying text), boilerplate content such as 'hi' or 'good morning', and, messages consisting solely of emojis, which constitute around 25% of our data. We filter such content to avoid characterising low entropy posts -such messages could fall under both on-topic and off-topic as junk [18]. As later discussed, the percentage of boilerplate messages among junk posters is 7% (12% for legitimate users).…”
Section: Message Filteringmentioning
confidence: 99%
“…We also remove messages containing just URLs (no accompanying text), boilerplate content such as 'hi' or 'good morning', and, messages consisting solely of emojis, which constitute around 25% of our data. We filter such content to avoid characterising low entropy posts -although they are off-topic and widely considered as spam as well [20], it is relatively easy to implement client-side filters for specific text such as 'hi' or 'good morning', whereas filtering out the other messages which we classify as spam is a harder problem (studied in §5). Filtering these out, we are left with 766K messages.…”
Section: Message Filteringmentioning
confidence: 99%