The emerging 5G networks promises more throughput, faster, and more reliable services, but as the network complexity and dynamics increases, it becomes more difficult to troubleshoot the systems. Vendors are spending a lot of time and effort on early anomaly detection in their development cycle and majority of the time is spent on manually analyzing system logs. While main research in anomaly detection uses performance metrics, anomaly detection using functional behaviour is still lacking in depth analysis. In this paper we show how a boosted ensemble of Long Short Term Memory classifiers can detect anomalies in the 5G Radio Access Network system logs. Acquiring system logs from a live 5G network is difficult due to confidentiality issues, live network disturbance, and problems to repeat scenarios. Therefore, we perform our evaluation on logs from a 5G test bed that simulate realistic traffic in a city. Our ensemble learns the functional behaviour of an application by training on logs from normal execution time. It can then detect deviations from normal behaviour and also be retrained on false positive cases found during validation. Anomaly detection in RAN shows that our ensemble called BoostLog, outperforms a single LSTM classifier and further testing on HDFS logs confirms that BoostLog also can be used in other domains. Instead of using domain experts to manually analyse system logs, BoostLog can be used by less experienced trouble shooters to automatically detect anomalies faster and more reliable.
Large-scale computing systems are today built as distributed systems (for reasons of scale, heterogeneity, cost and energy efficiency) where components and services are distributed and accessed remotely through clients and devices. In some systems, in particular latency-sensitive or high availability systems, components are also placed closer to end-users (in, e.g., radio base stations and other systems on the edge of access networks) in order to increase reliability and reduce latency -a style of computing often referred to as edge or fog computing. However, while recent years have seen significant advances in system instrumentation as well as data centre energy efficiency and automation, computational resources and network capacity are often provisioned using best effort provisioning models and coarse-grained quality of service (QoS) mechanisms, even in state-of-the-art data centres. These limitations are seen as a major hindrance in the face of the coming evolution of(IoT and the networked society, and have even today manifested in, e.g., a limited cloud adoption of systems with high reliability requirements such as telecommunications infrastructure and emergency services systems. RECAP goes beyond the current state of the art and develop the next generation of cloud/edge/fog computing capacity provisioning via targeted research advances in cloud infrastructure optimization, simulation and automation. Building on advanced machine learning, optimization and simulation techniques. The overarching result of RECAP is the next generation of agile and optimized cloud computing systems. The outcomes of the project will pave the way for a radically novel concept in the provision of cloud services, where services are instantiated and provisioned close to the users that actually need them by self-configurable cloud computing systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.