In this paper, we report YouTube traffic measurements from Orange IP backbone network connecting residential customers. We exhibit its salient features in relation to the performance of caching. By examining the file popularity distribution, we show that video requests are highly volatile in that a huge number of files are viewed only a few times; these files are therefore not relevant for caching. Nevertheless, there is a subset of files which are massively viewed by end users and are worth caching. On the basis of this experimental observation, we develop a mathematical model for estimating the efficiency of file caching in the presence of noise traffic composed of those files which are rarely requested and thus "pollute" the cache. We then proceed to trace-driven simulations in order to check the qualitative conclusions derived from the theoretical model.
Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, e.g., as a de-facto in HTTP/2 and QUIC protocols. In the current state of encrypted traffic classification using Deep Learning (DL), we identify fundamental issues in the way it is typically approached. For instance, although complex DL models with millions of parameters are being used, these models implement a relatively simple logic based on certain header fields of the TLS handshake, limiting model robustness to future versions of encrypted protocols. Furthermore, encrypted traffic is often treated as any other raw input for DL, while crucial domain-specific considerations exist that are commonly ignored. In this paper, we design a novel feature engineering approach that generalizes well for encrypted web protocols, and develop a neural network architecture based on Stacked Long Short-Term Memory (LSTM) layers and Convolutional Neural Networks (CNN) that works very well with our feature design. We evaluate our approach on a real-world traffic dataset from a major ISP and Mobile Network Operator. We achieve an accuracy of 95% in service classification with less raw traffic and smaller number of parameters, out-performing a state-of-the-art method by nearly 50% fewer false classifications. We show that our DL model generalizes for different classification objectives and encrypted web protocols. We also evaluate our approach on a public QUIC dataset with finer and application-level granularity in labeling, achieving an overall accuracy of 99%.
We report in this paper traffic measurements of YouTube traffic from Orange networks. We specifically analyze two weeks of measurements in early April 2012. We show that the popularity curves of YouTube files are constant in time and can be well approximated by truncated Zipf laws with a shape parameter less than one. In addition, there is a huge number of files which are viewed very rarely (only once or twice). Even if this may appear as an unfavorable situation with regard to caching in view of theoretical results on cache systems, we show that thanks to file request dynamics caching is very efficient for YouTube traffic. The main reason is that files are massively requested in bursts. Since these bursts represent a significant part of traffic, caching files even by using a rather small storage capacity can achieve high gains in terms of saved bandwidth. Bursts are moreover sufficiently intense so that popular files are not pushed out of the cache memory by those files viewed only once or twice. These observations are illustrated by performing trace driven simulations by using traffic traces captured in the Orange IP backbone network.
Traffic classification is essential in network management for a wide range of operations. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, for example, as a de facto in HTTP/2 and QUIC protocols. In the current state of encrypted traffic classification using deep learning (DL), we identify fundamental issues in the way it is typically approached. For instance, although complex DL models with millions of parameters are being used, these models implement a relatively simple logic based on certain header fields of the TLS handshake, limiting model robustness to future versions of encrypted protocols. Furthermore, encrypted traffic is often treated as any other raw input for DL, while crucial domain-specific considerations are commonly ignored. In this paper, we design a novel feature engineering approach used for encrypted Web protocols, and develop a neural network architecture based on stacked long short-term memory layers and convolutional neural networks. We evaluate our approach on a real-world Web traffic dataset from a major Internet service provider and mobile network operator. We achieve an accuracy of 95% in service classification with less raw traffic and a smaller number of parameters, outperforming a state-of-the-art method by nearly 50% fewer false classifications. We show that our DL model generalizes for different classification objectives and encrypted Web protocols. We also evaluate our approach on a public QUIC dataset with finer application-level granularity in labeling, achieving an overall accuracy of 99%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.