The traffic classification problem has recently attracted the interest of both network operators and researchers. Several machine learning (ML) methods have been proposed in the literature as a promising solution to this problem. Surprisingly, very few works have studied the traffic classification problem with Sampled NetFlow data. However, Sampled NetFlow is a widely extended monitoring solution among network operators. In this paper we aim to fulfill this gap. First, we analyze the performance of current ML methods with NetFlow by adapting a popular ML-based technique. The results show that, although the adapted method is able to obtain similar accuracy than previous packet-based methods (≈90%), its accuracy degrades drastically in the presence of sampling. In order to reduce this impact, we propose a solution to network operators that is able to operate with Sampled NetFlow data and achieve good accuracy in the presence of sampling.
Abstract-Privacy seems to be the Achilles' heel of today's web. Most web services make continuous efforts to track their users and to obtain as much personal information as they can from the things they search, the sites they visit, the people they contact, and the products they buy. This information is mostly used for commercial purposes, which go far beyond targeted advertising. Although many users are already aware of the privacy risks involved in the use of Internet services, the particular methods and technologies used for tracking them are much less known. In this survey, we review the existing literature on the methods used by web services to track the users online as well as their purposes, implications, and possible user's defenses. We present 5 main groups of methods used for user tracking, which are based on sessions, client storage, client cache, fingerprinting, and other approaches. A special focus is placed on mechanisms that use web caches, operational caches, and fingerprinting, as they are usually very rich in terms of using various creative methodologies. We also show how the users can be identified on the web and associated with their real names, e-mail addresses, phone numbers, or even street addresses. We show why tracking is being used and its possible implications for the users. For each of the tracking methods, we present possible defenses. Some of them are specific to a particular tracking approach, while others are more universal (block more than one threat). Finally, we present the future trends in user tracking and show that they can potentially pose significant threats to the users' privacy.
Deep Packet Inspection (DPI) is the state-of-the-art technology for traffic classification. According to the conventional wisdom, DPI is the most accurate classification technique. Consequently, most popular products, either commercial or open-source, rely on some sort of DPI for traffic classification. However, the actual performance of DPI is still unclear to the research community, since the lack of public datasets prevent the comparison and reproducibility of their results. This paper presents a comprehensive comparison of 6 well-known DPI tools, which are commonly used in the traffic classification literature. Our study includes 2 commercial products (PACE and NBAR) and 4 open-source tools (OpenDPI, L7-filter, nDPI, and Libprotoident). We studied their performance in various scenarios (including packet and flow truncation) and at different classification levels (application protocol, application and web service). We carefully built a labeled dataset with more than 750 K flows, which contains traffic from popular applications. We used the Volunteer-Based System (VBS), developed at Aalborg University, to guarantee the correct labeling of the dataset. We released this dataset, including full packet payloads, to the research community. We believe this dataset could become a common benchmark for the comparison and validation of network traffic classifiers. Our results present PACE, a commercial tool, as the most accurate solution. Surprisingly, we find that some open-source tools, such as nDPI and Libprotoident, also achieve very high accuracy.
Traffic classification is an important aspect in network operation and management, but challenging from a research perspective. During the last decade, several works have proposed different methods for traffic classification. Although most proposed methods achieve high accuracy, they present several practical limitations that hinder their actual deployment in production networks. For example, existing methods often require a costly training phase or expensive hardware, while their results have relatively low completeness. In this paper, we address these practical limitations by proposing an autonomic traffic classification system for large networks. Our system combines multiple classification techniques to leverage their advantages and minimize the limitations they present when used alone. Our system can operate with Sampled NetFlow data making it easier to deploy in production networks to assist network operation and management tasks. The main novelty of our system is that it can automatically retrain itself in order to sustain a high classification accuracy along time. We evaluate our solution using a 14-day trace from a large production network and show that our system can sustain an accuracy greater than 96%, even in presence of sampling, during long periods of time. The proposed system has been deployed in production in the Catalan Research and Education network and it is currently being used by network managers of more than 90 institutions connected to this network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.