Cryptojacking or illegal mining is a form of malware that hides in the victim’s computer and takes the computational resources to extract cryptocurrencies in favor of the attacker. It generates significant computational consumption, reducing the computational efficiency of the victim’s computer. This attack has increased due to the rise of cryptocurrencies and their profitability and its difficult detection by the user. The identification and blocking of this type of malware have become an aspect of research related to cryptocurrencies and blockchain technology; in the literature, some machine learning and deep learning techniques are presented, but they are still susceptible to improvement. In this work, we explore multiple Machine Learning classification models for detecting cryptojacking on websites, such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting Classifier, k-Nearest Neighbor, and XGBoost. To this end, we make use of a dataset, composed of network and host features’ samples, to which we apply various feature selection methods such as those based on statistical methods, e.g., Test Anova, and other methods as Wrappers, not only to reduce the complexity of the built models but also to discover the features with the greatest predictive power. Our results suggest that simple models such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and k-Nearest Neighbor models, can achieve success rate similar to or greater than that of advanced algorithms such as XGBoost and even those of other works based on Deep Learning.
Currently, the volume of sensitive content on the Internet, such as pornography and child pornography, and the amount of time that people spend online (especially children) have led to an increase in the distribution of such content (e.g., images of children being sexually abused, real-time videos of such abuse, grooming activities, etc.). It is therefore essential to have effective IT tools that automate the detection and blocking of this type of material, as manual filtering of huge volumes of data is practically impossible. The goal of this study is to carry out a comprehensive review of different learning strategies for the detection of sensitive content available in the literature, from the most conventional techniques to the most cutting-edge deep learning algorithms, highlighting the strengths and weaknesses of each, as well as the datasets used. The performance and scalability of the different strategies proposed in this work depend on the heterogeneity of the dataset, the feature extraction techniques (hashes, visual, audio, etc.) and the learning algorithms. Finally, new lines of research in sensitive-content detection are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.