Due to the rapid growth in network traffic and increasing security threats, Intrusion Detection Systems (IDS) have become increasingly critical in the field of cyber security for providing secure communications against cyber adversaries. However, there exist many challenges for designing a robust, efficient and accurate IDS, especially when dealing with high-dimensional anomaly data with unforeseen and unpredictable attacks. In this paper, we propose a Robust Transformer-based Intrusion Detection System (RTIDS) reconstructing feature representations to make a trade-off between dimensionality reduction and feature retention in imbalanced datasets. The proposed method utilizes positional embedding technique to associate sequential information between features, then a variant stacked encoder-decoder neural network is used to learn low-dimensional feature representations from high-dimensional raw data. Furthermore, we apply self-attention mechanism to facilitate network traffic type classifications. Extensive experiments reveal the effectiveness of the proposed RTIDS on two publicly available real traffic intrusion detection datasets named CICIDS2017 and CIC-DDoS2019 with F1-Score of 99.17% and 98.48% respectively. A comparative study with classical machine learning algorithm support vector machine (SVM) and deep learning algorithms that include recurrent neural network (RNN), fuzzy neural network (FNN), and long short-term memory network (LSTM) is conducted to demonstrate the validity of the proposed method.
Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.
<abstract><p>Nowadays, Spark Streaming, a computing framework based on Spark, is widely used to process streaming data such as social media data, IoT sensor data or web logs. Due to the extensive utilization of streaming media data analysis, performance optimization for Spark Streaming has gradually developed into a popular research topic. Several methods for enhancing Spark Streaming's performance include task scheduling, resource allocation and data skew optimization, which primarily focus on how to manually tune the parameter configuration. However, it is indeed very challenging and inefficient to adjust more than 200 parameters by means of continuous debugging. In this paper, we propose an improved dueling double deep Q-network (DQN) technique for parameter tuning, which can significantly improve the performance of Spark Streaming. This approach fuses reinforcement learning and Gaussian process regression to cut down on the number of iterations and speed convergence dramatically. The experimental results demonstrate that the performance of the dueling double DQN method with Gaussian process regression can be enhanced by up to 30.24%.</p></abstract>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.