Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries 2016
DOI: 10.1145/2910896.2910917
|View full text |Cite
|
Sign up to set email alerts
|

Quality Assessment of Wikipedia Articles without Feature Engineering

Abstract: As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no "golden feature set" was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing their content rather than by considering a feature set. Our approach uses recent techniques in natural langua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
1
2

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 57 publications
(49 citation statements)
references
References 20 publications
0
46
1
2
Order By: Relevance
“…We showed that in addition to studying the structure-based features by analyzing the content of the articles in terms of their readability, a higher accuracy and information gain can be obtained compared with other approaches. As a future direction of our work we plan to combine manual feature design with automatic feature extraction from deep learning techniques [62] in order to improve classification performance.…”
Section: Discussionmentioning
confidence: 99%
“…We showed that in addition to studying the structure-based features by analyzing the content of the articles in terms of their readability, a higher accuracy and information gain can be obtained compared with other approaches. As a future direction of our work we plan to combine manual feature design with automatic feature extraction from deep learning techniques [62] in order to improve classification performance.…”
Section: Discussionmentioning
confidence: 99%
“…With the development of deep-learning models, some researchers have attempted to apply deep-learning models to the quality assessment of Wikipedia. Doc2Vec was used to represent Wikipedia articles, and a DNN was applied to classify the article quality (Dang & Ignat, 2016b). Although Doc2Vec was used to automatically extract features, ensuring that Doc2Vec acquires the important features that contribute most to quality classification is difficult.…”
Section: Deep-learning-based Quality Assessment Methodsmentioning
confidence: 99%
“…However, this feature framework only partially focuses on the information of each article. Doc2Vec was used to automatically generate features and directly represent Wikipedia articles (Dang & Ignat, 2016b). Those authors also applied Doc2Vec to the raw content of Wikipedia articles instead of applying Doc2Vec to the text after process (Dang & Ignat, 2016c).…”
Section: Feature Frameworkmentioning
confidence: 99%
“…There are also a few works that try to combine metrics from articles content and edition history [16,17].…”
Section: Related Workmentioning
confidence: 99%
“…Although existing works propose various sets of metrics for assessing quality of Wikipedia articles, there is no universal feature set for this task [17]. Additional challenge is to consider different language versions, which can have different quality models [4,5].…”
Section: Related Workmentioning
confidence: 99%