Among different traffic classification approaches, Deep Packet Inspection (DPI) methods are considered as the most accurate. These methods, however, have two drawbacks: (i) they are not efficient since they use complex regular expressions as protocol signatures, and (ii) they require manual intervention to generate and maintain signatures, partly due to the signature complexity.In this paper, we present CUTE, an automatic traffic classification method, which relies on sets of weighted terms as protocol signatures. The key idea behind CUTE is an observation that, given appropriate weights, the occurrence of a specific term is more important than the relative location of terms in a flow. This observation is based on experimental evaluations as well as theoretical analysis, and leads to several key advantages over previous classification techniques: (i) CUTE is extremely faster than other classification schemes since matching flows with weighed terms is significantly faster than matching regular expressions; (ii) CUTE can classify network traffic using only the first few bytes of the flows in most cases; and (iii) Unlike most existing classification techniques, CUTE can be used to classify partial (or even slightly modified) flows. Even though CUTE replaces complex regular expressions with a set of simple terms, using theoretical analysis and experimental evaluations (based on two large packet traces from tier-one ISPs), we show that its accuracy is as good as or better than existing complex classification schemes, i.e. CUTE achieves precision and recall rates of more than 90%. Additionally, CUTE can successfully classify more than half of flows that other DPI methods fail to classify.
Online types of expression in the form of social networks, micro-blogging, blogs and rich content sharing platforms have proliferated in the last few years. Such proliferation contributed to the vast explosion in online data sharing we are experiencing today.One unique aspect of online data sharing is tags manually inserted by content generators to facilitate content description and discovery (e.g., hashtags in tweets). In this paper we focus on these tags and we study and propose algorithms that make use of tags in order to automatically organize and categorize this vast collection of socially contributed and tagged information. In particular, we take a holistic approach in organizing such tags and we propose algorithms to partition as well as rank this information collection. Our partitioning algorithms aim to segment the entire collection of tags (and the associated content) into a specified number of partitions for specific problem constraints. In contrast our ranking algorithms aim to identify few partitions fast, for suitably defined ranking functions.We present a detailed experimental study utilizing the full twitter firehose (set of all tweets in the Twitter service) that attests to the practical utility and effectiveness of our overall approach. We also present a detailed qualitative study of our results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.