We present a new approach using data-mining technique and, in particular, decision tree to classify peer-to-peer (P2P) traffic in IP networks. We captured the Internet traffic at a main gateway router, performed preprocessing on the data, selected the most significant attributes, and prepared a training-data set to which the decision-tree algorithm was applied. We built several models using a combination of various attribute sets for different ratios of P2P to non-P2P traffic in the training data. We observed that the accuracy of the model increases significantly when we include the attributes "Src IP addr" and "Dst IP addr" in building the model. By detecting communities of peers, we achieved classification accuracy of higher than 98%. Consequently, we recommend that: (a) the classification must be done within the authority of the Internet service providers (ISP) in order to detect communities of peers, and (b) the decision tree needs to be frequently trained to ensure the fairness and correctness of the classification algorithm. Our approach is based only on information in the IP layer, eliminating the privacy issues associated with deep-packet inspection.
Abstract-Network measurement at 10+Gbps speeds imposes many restrictions on the resource consumption of the measurement application, making any filtering of input data highly desirable. Symmetric Connection Detection (SCD) is a method of filtering TCP sessions, passing only those sessions which become fully established. SCD can benefit network monitoring applications that are only interested fully established TCP connections by reducing processing requirements. Incomplete connection attempts, such as port scanning attempts, simply waste resources in many applications if they are not filtered. SCD filters out unsuccessful connection attempts using a combination of Bloom filters to track the state of connection establishment for every flow passing through a network device. Unsuccessful flows can be filtered out to a very high degree of accuracy, depending on the size of the Bloom filter and traffic rate, 99.5% is typical. Resource consumption, both memory and CPU is low. The core SCD algorithm is designed to work in high-speed routers, in real-time, and at line speed. Using an upper bound of 32k bytes of RAM our experimental results indicate 99+% accuracy with 900,000 active flows.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.