In the field of network security, the task of processing and analyzing huge amount of Packet CAPture (PCAP) data is of utmost importance for developing and monitoring the behavior of networks, having an intrusion detection and prevention system, firewall etc. In recent times, Apache Spark in combination with Hadoop Yet-Another-Resource-Negotiator (YARN) is evolving as a generic Big Data processing platform. While processing raw network packets, timely inference of network security is a primitive requirement. However, to the best of our knowledge, no prior work has focused on systematic study on fine-tuning the resources, scalability and performance of distributed Apache Spark cluster (while processing PCAP data). For obtaining best performance, various cluster parameters like number of cluster nodes, number of cores utilized from each node, total number of executors run in the cluster, amount of main-memory used from each node, executor memory overhead allotted for each node to handle garbage collection issue, etc., have been finetuned, which is the focus of the proposed work. Through the proposed strategy, we could analyze 85GB of data (provided by CSIR Fourth Paradigm Institute) in just 78 seconds, using 32 node (256 cores) Spark cluster. This would otherwise take around 30 minutes in traditional processing systems.
In distributed database management systems, fragmenting base connections increases concurrency and hence system throughput for query processing. User queries use hybrid fragmentation methods focused on vector bindings, and deductive database implementations lack query-access-rule dependence. As a result, for hierarchical deductive information implementations, a hybrid fragmentation solution is used. The method considers the horizontal partition of base relations based on the bindings placed on user requests, then produces vertical fragments of the horizontally partitioned relations, and finally clusters rules based on attribute affinity and query and rule access frequency. The suggested fragmentation approach makes distributed deductive database structures easier to develop.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.