Flow Clustering Using Machine Learning Techniques

McGregor, Anthony; Hall, Mark; Lorier, Perry; Brunskill, James

doi:10.1007/978-3-540-24668-8_21

Cited by 390 publications

(239 citation statements)

References 2 publications

Supporting

Mentioning

235

Contrasting

Unclassified

Order By: Relevance

“…Researchers and developers often embed an assumption of traffic symmetry in tools and analyses [8,9,10], an assumption only safe for stub access links, otherwise quite harmful [11].…”

Section: Introductionmentioning

confidence: 99%

Estimating routing symmetry on single links by passive flow measurements

John

Dusi

claffy³

2010

Proceedings of the 6th International Wireless Communications and Mobile Computing Conference

View full text Add to dashboard Cite

The assumption of routing symmetry is often embedded into traffic analysis and classification tools. This paper uses passively captured network data to estimate the amount of traffic actually routed symmetrically on a specific link. We propose a Flow-Based Symmetry Estimator (FSE) -a set of metrics to assess symmetry in terms of flows, packets and bytes, which disregards inherently asymmetrical traffic such as UDP, ICMP and TCP background radiation. This normalized metric allows fair comparison of symmetry across different links. We evaluate our method on a large heterogeneous dataset, and confirm anecdotal reports that routing symmetry typically does not hold for non-edge Internet links, and decreases as one moves toward core backbone links, due to routing policy complexity. Our proposed metric for traffic asymmetry induced by routing policies will help the community improve traffic characterization techniques and formats, but also support quantitative formalization of routing policy effects on links in the wild.

show abstract

“…Researchers and developers often embed an assumption of traffic symmetry in tools and analyses [8,9,10], an assumption only safe for stub access links, otherwise quite harmful [11].…”

Section: Introductionmentioning

confidence: 99%

Estimating routing symmetry on single links by passive flow measurements

John

Dusi

claffy³

2010

Proceedings of the 6th International Wireless Communications and Mobile Computing Conference

View full text Add to dashboard Cite

show abstract

“…This better models a real-world situation given that it may not be accurate to associate a given data point to one exclusive cluster based on the training set. McGregor et al [57] used the Expectation Maximization (EM) algorithm to classify flows. The authors believed Internet traffic flows can be clustered by application as they were able to see distinct applications based on the packet size and inter-arrival time of packets.…”

Section: Clusteringmentioning

confidence: 99%

Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification

2009

View full text Add to dashboard Cite

Classifying Internet traffic flows online into applications or broader classes without inspecting the packet payloads or without relying on port numbers has become a necessity for network operators. The operators can use this information to monitor their networks and provide per-class quality of service. There has been a great deal of research done on Internet traffic classification recently and numerous techniques have been proposed. While the current techniques can obtain a high accuracy classifying Internet traffic, providing performance guarantees for particular classes of interest has never been addressed. In this thesis, we provide two novel types of online Internet traffic classifiers that can provide performance guarantees on the false alarm and false discovery rates, respectively. These guarantees can be for an entire class (class-wise) or between two classes (pair-wise). Controlling false alarm rates is well-suited for application prioritization (i.e. prioritizing time-sensitive applications like VoIP over HTTP) whereas controlling false discovery rates is better suited for blocking or rate-limiting a targeted class of traffic (i.e. Peer-to-Peer). The classifier that provides false alarm rate guarantees is based on a Neyman-Pearson classification framework while the classifier that provides false discovery rate guarantees is based on the Learning to Satisfy (LSAT) framework. Both of these classifiers are implemented using a machine learning technique, namely, the 2-nu Support Vector Machine (SVM). Moreover, all previous work done with these two statistical methodologies focused on binary classification only; we extend these statistical methodologies to a multi-class setting. In addition to the regular application classification problem, we also present preliminary work on a binary LSAT classifier that can detect, after the reception of only a handful of packets, whether a flow will be large, as defined by a network operator. This large flow detector can act as a preprocessor for regular application classifiers. By allowing only large flows to pass to the classifier, this allows the classifier to focus on the more resource-intensive flows. We validated our Internet traffic classifiers by testing our approaches using data provided by an ISP.ii Abrégé Identifier l'application (ou autre classe plus générale) qui génère un flux de trafic Internet, sans compter sur le numéro du port ou inspecter la charge des paquets, est devenu une nécessité pour les opérateurs de réseau. Les opérateurs peuvent utiliser cette information pour surveiller leurs réseaux et fournir une qualité de service propreà chaque classe. Il y a eu beaucoup de travaux de recherche portant sur la classification du trafic Internet effectué récemment et de nombreuses techniques ontété proposées. Bien que les techniques actuelles puissent obtenir une grande précision pour classer le trafic Internet, offrir des garanties de performance pour des catégories particulières est un problème encore inexploré.Dans ce mémoire, nous proposons deu...

show abstract

“…Also, it is unclear how good the discrimination of flows is because in [3] the sets of attributes are averaged over all flows of certain applications in 24-hour periods. In [4] the authors use the Expectation Maximization (EM) algorithm to cluster flows into different application types using a fixed set of attributes. From their evaluation it is not clear what influence different attributes have and how good the clustering actually is.…”

Section: Related Workmentioning

confidence: 99%

“…EM is an unsupervised Bayesian classifier that automatically learns the 'natural' classes (also called clustering) inherent in a training dataset with unclassified cases. The resulting classifier can then be used to classify new cases (see [4], [9]). …”

Section: Ml-based Flow Classification Approach and Evaluationmentioning

confidence: 99%

“…Classification in a high dimensional attributes space is a big challenge for humans and rule-based methods, but stochastic ML algorithms can easily perform this task. The use of stochastic ML for traffic classification was raised in [2], [3] and [4]. However, to the best of our knowledge no systematic approach for application classification and evaluation has been proposed and an understanding of possible achievements and limitations is still lacking.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Self-Learning IP Traffic Classification Based on Statistical Flow Characteristics

Zander

Nguyễn

Armitage

2005

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.A number of key areas in IP network engineering, management and surveillance greatly benefit from the ability to dynamically identify traffic flows according to the applications responsible for their creation. Currently such classifications rely on selected packet header fields (e.g. destination port) or application layer protocol decoding. These methods have a number of shortfalls e.g. many applications can use unpredictable port numbers and protocol decoding requires high resource usage or is simply infeasible in case protocols are unknown or encrypted. We propose a framework for application classification using an unsupervised machine learning (ML) technique. Flows are automatically classified based on their statistical characteristics. We also propose a systematic approach to identify an optimal set of flow attributes to use and evaluate the effectiveness of our approach using captured traffic traces.

show abstract

Flow Clustering Using Machine Learning Techniques

Cited by 390 publications

References 2 publications

Estimating routing symmetry on single links by passive flow measurements

Estimating routing symmetry on single links by passive flow measurements

Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification

Self-Learning IP Traffic Classification Based on Statistical Flow Characteristics

Contact Info

Product

Resources

About