The prosperity of machine learning has been accompanied by increasing attacks on the training process. Among them, poisoning attacks have become an emerging threat during model training. Poisoning attacks have profound impacts on the target models, e.g., making them unable to converge or manipulating their prediction results. Moreover, the rapid development of recent distributed learning frameworks, especially federated learning, has further stimulated the development of poisoning attacks. Defending against poisoning attacks is challenging and urgent. However, the systematic review from a unified perspective remains blank. This survey provides an in-depth and up-to-date overview of poisoning attacks and corresponding countermeasures in both centralized and federated learning. We firstly categorize attack methods based on their goals. Secondly, we offer detailed analysis of the differences and connections among the attack techniques. Furthermore, we present countermeasures in different learning framework and highlight their advantages and disadvantages. Finally, we discuss the reasons for the feasibility of poisoning attacks and address the potential research directions from attacks and defenses perspectives, separately.
The popularity of encryption method brings a great challenge to malware traffic identification. Traditional classes defined by expert experience are usually classified based on the host behaviors of malware, such as banking malware or ransomware, which are often irrelevant to its communication traffic behaviors. It leads to the fact that the boundaries of traffic feature dataset of different malware classes are fuzzy and make these traditional classes unhelpful for classification based on traffic features. Meanwhile, traditional machine learning-based encrypted malware traffic identification methods, such as using the multi-classification supervised learning model, are inefficient both in model training and detection, and its detection accuracy cannot meet the demand. In this paper, we propose a distance-based method, which utilizes unsupervised learning algorithm Gaussian mixture model (GMM) and ordering points to identify the clustering structure (OPTICS) to calculate the Distance between malwares and make use of the Distance to define the new malware class called FClass. Then, a set of models are trained by XGBoost algorithm to build an identification framework based on the FClass. The performance of the proposed method has been evaluated by comparing it with the other four methods. The results show that the proposed distance-based method is more efficient and accurate.
The transport layer security (TLS) protocol is widely adopted by apps as well as malware. With the geometric growth of TLS traffic, accurate and efficient detection of malicious TLS flows is becoming an imperative. However, current studies focus on either detection accuracy or detection efficiency, and few studies take into account both indicators. In this paper, we propose a two-layer detection framework composed of a filtering model (FM) and a malware family classification model (MFCM). In the first layer, a new set of TLS handshake features is presented to train the FM, which is devised to filter out a majority of benign TLS flows. For identifying malware families, both TLS handshake features and statistical features are applied to construct the MFCM in the second layer. Comprehensive experiments are conducted to substantiate the high accuracy and efficiency of the proposed two-layer framework. A total of 96.32% of benign TLS flows can be filtered out by the FM with few malicious TLS flows being discarded provided the threshold of the FM is set to 0.01. Moreover, a multiclassifier is selected to construct the MFCM to provide better performance than a set of binary classifiers under the same feature set. In addition, when the ratio of benign and malicious TLS flows is set to 10:1, the detection efficiency of the two-layer framework is 188% faster than that of the single-layer framework, while the average detection accuracy reaches 99.45%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.