2020
DOI: 10.1108/jeim-06-2019-0156
|View full text |Cite
|
Sign up to set email alerts
|

Exploring the impact of short-text complexity and structure on its quality in social media

Abstract: PurposeThe purpose of this paper is to explore to which extent the quality of social media short text without extensions can be investigated and what are the predictors, if any, of such short text that lead to trust its content.Design/methodology/approachThe paper applies a trust model to classify data collections based on metadata into four classes: Very Trusted, Trusted, Untrusted and Very Untrusted. These data are collected from the online communities, Genius and Stack Overflow. In order to evaluate short t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 18 publications
(9 citation statements)
references
References 41 publications
0
9
0
Order By: Relevance
“…We have applied an under-sampling approach that reduces the number of samples of the majority class to the minority class, thus reducing the bias in the size distribution of the data subsets. For more details see [13]. We have set the ratio of the reduction to be up to 2 fold.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We have applied an under-sampling approach that reduces the number of samples of the majority class to the minority class, thus reducing the bias in the size distribution of the data subsets. For more details see [13]. We have set the ratio of the reduction to be up to 2 fold.…”
Section: Resultsmentioning
confidence: 99%
“…The number of words/features after performing the pre-processing stage is 714 for trusted vs untrusted data set, 848 words for the very-trusted vs very-untrusted data set and 1440 words for the human-aids vs mouse-cancer data set. For more detail see [13].…”
Section: Pre-processingmentioning
confidence: 99%
“…Readability indices are used by researchers to measure the complexity of text data mostly in text simplification tasks [30], [17] [31], [32], [33], [34]. In order to prove that the sentences chosen for building this dataset are more complex than the existing benchmark datasets, a comparative readability analysis is conducted between the two existing benchmark datasets and the proposed dataset.…”
Section: B Readability Analysismentioning
confidence: 99%
“…Two of the most popular datasets are the STS benchmark dataset [15] and the SICK dataset [16] on which the BERT models have achieved near-perfect results [12]. Analyzing the readability of the sentences in these datasets, we find that the sentences in these datasets have a low readability index which is a measure of complexity of sentences [17]. However, various real world applications of semantic similarity involve more complex sentences to be analysed [18].…”
Section: Introductionmentioning
confidence: 99%
“…In other words, the textual data is represented as a bag of topics ( Zhou et al, 2009 ; Yousef et al, 2020a ) rather than a bag of words. However, in the short-text corpus, an advanced approach must be developed ( Al Qundus et al, 2020 ).…”
Section: Introductionmentioning
confidence: 99%