ABSTRACT.
Cellular networks monitoring and management tasks are based on huge amounts of continuously collected data from network elements and devices. Log files are used to store this data, but it might need to accumulate millions of lines in one day. The standard name of this log is in GPEH format which stands for General Performance Event Handling. This log is usually recorded in a binary format (bin). Thus, efficient and fast compression technique is considered as one of the main aspects targeting the storage capabilities. On the other hand, based on our experience, we noticed that experts and network engineers are not interested in each log entry. In addition, this massive burst of entries can lose important information; especially those translated into performance abnormalities. Thus, summarizing log files would be beneficial in specifying the different problems on certain elements, the overall performance and the expected network future state. In this paper, we introduce an efficient compression algorithm based log frequent patterns. In addition, we propose a Mixed Mode Summary-based Lossless Compression Technique for Mobile Networks log files (MMSLC) as a mixed on-line and off-line compression modes based on the summary extracted from the frequent patterns. Our scheme exploits the strong correlation between the directly and consecutively recorded bin files for utilizing the online compression mode. On the other hand,it uses the famous "Apriori Algorithm" to extract the frequent patterns from the current file in offline mode. Our proposed scheme is proved to gain high compression ratios in fast speed as well as help in extracting beneficial information from the recorded data.
KEYWORDS. Logs, Compression, Frequent Patterns
1.INTRODUCTIONCellular networks monitoring and management tasks are based on huge amounts of data that are continuously collected from elements and devices from all around the network [5] [11]. Log files are considered the most famous storage file type for distributed systems like cellular networks. These files are collections of log entries and each entry contains information related to a specific event took place within the system. The importance of these files comes from the fact of using them to monitor the current status of the system, track its performance indicators, identifying frauds and problems to help in decision making in all aspects of the network [2] [3]. Unfortunately, these files are recorded in a binary format (bin files) that needs huge effort to be parsed first, and then evaluated.Initially, log files were used only for troubleshooting problems [8]. However, nowadays, it is used in many functions within most organizations and associations, such as system optimization and measuring network performance, recording users' actions and investigating malicious activities [2] [3]. Unfortunately, these log files can be of order tens of gigabytes per day, and must be stored for a number of days as history. Thus, efficient and fast compression of these log files becomes very imp...