As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no "golden feature set" was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing their content rather than by considering a feature set. Our approach uses recent techniques in natural language processing and deep learning, and achieved a comparable result with the state-of-the-art.
Abstract-Wikipedia is a great example of large scale collaboration, where people from all over the world together build the largest and maybe the most important human knowledge repository in the history. However, a number of studies showed that the quality of Wikipedia articles is not equally distributed. While many articles are of good quality, many others need to be improved. Assessing the quality of Wikipedia articles is very important for guiding readers towards articles of high quality and suggesting authors and reviewers which articles need to be improved. Due to the huge size of Wikipedia, an effective automatic assessment method to measure Wikipedia articles quality is needed.In this paper, we present an automatic assessment method of Wikipedia articles quality by analyzing their content in terms of their format features and readability scores. Our results show improvements both in terms of accuracy and information gain compared with other existing approaches.
Intrusion detection systems (IDSs) have been studied widely in the computer security community for a long time. The recent development of machine learning techniques has boosted the performance of the intrusion detection systems significantly. However, most modern machine learning and deep learning algorithms are exhaustive of labeled data that requires a lot of time and effort to collect. Furthermore, it might be late until all the data is collected to train the model. In this study, we first perform a comprehensive survey of existing studies on using machine learning for IDSs. Hence we present two approaches to detect the network attacks. We present that by using a tree-based ensemble learning with feature engineering we can outperform state-ofthe-art results in the field. We also present a new approach in selecting training data for IDSs hence by using a small subset of training data combined with some weak classification algorithms we can improve the performance of the detector while maintaining the low running cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.