Jun Xu scite author profile

We propose the k -representative regret minimization query ( k -regret) as an operation to support multi-criteria decision making. Like top- k , the k -regret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions. Like skyline, it filters out a set of interesting points from a potentially large database based on the users' criteria; however, it never overwhelms the users by outputting too many tuples. In particular, for any number k and any class of utility functions, the k -regret query outputs k tuples from the database and tries to minimize the maximum regret ratio . This captures how disappointed a user could be had she seen k representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable. The first challenge of this approach is that it is not clear if the maximum regret ratio would be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size. Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.

show abstract

Data streaming algorithms for estimating entropy of network traffic

Lall

Sekar

Ogihara

et al. 2006

SIGMETRICS Perform. Eval. Rev.

131

113

View full text Add to dashboard Cite

Using entropy of traffic distributions has been shown to aid a wide variety of network monitoring applications such as anomaly detection, clustering to reveal interesting patterns, and traffic classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU and memory requirements. Estimating the entropy in a streaming model to enable such fine-grained traffic analysis has been a challenging problem. We give lower bounds for this problem, showing that neither approximation nor randomization alone will let us compute the entropy efficiently.We present two algorithms for randomly approximating the entropy in a time and space efficient manner, applicable for use on very high speed (greater than OC-48) links. Our first algorithm for entropy estimation, inspired by the seminal work of Alon et al. for estimating frequency moments, has strong theoretical guarantees on the error and resource usage. Our second algorithm utilizes the observation that the efficiency can be substantially enhanced by separating the high-frequency items (or elephants), from the low-frequency items (or mice). Evaluations on real-world traffic traces from different deployment scenarios demonstrate the utility of our approaches.

show abstract

Efficient and scalable query routing for unstructured peer-to-peer networks

Kumar

Zegura

114

View full text Add to dashboard Cite

Searching for content in peer-to-peer networks is an interesting and challenging problem. Queries in Gnutella-like unstructured systems that use flooding or random walk to search must visit ¢ ¤ £ ¦ ¥ § nodes in a network of size ¥ , thus consuming significant amounts of bandwidth. In this paper, we propose a query routing protocol that allows low bandwidth consumption during query forwarding using a low cost mechanism to create and maintain information about nearby objects. To achieve this, our protocol maintains a lightweight probabilistic routing table at each node that suggests the location of each object in the network. Following the corresponding routing table entries, a query can reach the destination in a small number of hops with high probability. However, maintaining routing tables in a large and highly dynamic network requires non-traditional mechanisms.We design a novel data structure called an Exponentially Decaying Bloom Filter (EDBF) that encodes such probabilistic routing tables in a highly compressed manner, and allows for efficient aggregation and propagation. The search primitives provided by our system can be used to search for single keys or multiple keywords with equal ease. Analytical modeling of our design predicts significant improvements in search efficiency, verified through extensive simulations in which we observed an order of magnitude reduction in query path length over previous proposals.

show abstract

Space-code bloom filter for efficient per-flow traffic measurement

Kumar

Wang

et al.

134

View full text Add to dashboard Cite

Abstract-Per-flow traffic measurement is critical for usage accounting, traffic engineering, and anomaly detection. Previous methodologies are either based on random sampling (e.g., Cisco's NetFlow), which is inaccurate, or only account for the "elephants". We introduce a novel technique for measuring perflow traffic approximately, for all flows regardless of their sizes, at very high-speed (say, OC768). The core of this technique is a novel data structure called Space Code Bloom Filter (SCBF). A SCBF is an approximate representation of a multiset; each element in this multiset is a traffic flow and its multiplicity is the number of packets in the flow. The multiplicity of an element in the multiset represented by SCBF can be estimated through either of two mechanisms -Maximum Likelihood Estimation (MLE) or Mean Value Estimation (MVE). Through parameter tuning, SCBF allows for graceful tradeoff between measurement accuracy and computational and storage complexity. SCBF also contributes to the foundation of data streaming by introducing a new paradigm called blind streaming. We evaluate the performance of SCBF through mathematical analysis and through experiments on packet traces gathered from a tier-1 ISP backbone. Our results demonstrate that SCBF achieves reasonable measurement accuracy with very low storage and computational complexity.

show abstract

On the Fundamental Tradeoffs Between Routing Table Size and Network Diameter in Peer-to-Peer Networks

Kumar

2004

IEEE J. Select. Areas Commun.

126

View full text Add to dashboard Cite

Abstract-In this work, we study a fundamental tradeoff issue in designing distributed hash ). They asked whether this represents the best asymptotic "state-efficiency" tradeoffs. Our first major result is to show that there are straightforward routing algorithms which achieve better asymptotic tradeoffs. However, such algorithms all cause severe congestion on certain network nodes, which is undesirable in a P2P network. We then rigorously define the notion of "congestion" and conjecture that the above tradeoffs are asymptotically optimal for a congestion-free network. In studying this conjecture, we have thoroughly clarified the role that "congestionfree" plays in this "state-efficiency" tradeoff. Our second major result is to prove that the aforementioned tradeoffs are asymptotically optimal for uniform algorithms. Furthermore, for uniform algorithms, we find that the routing table size of Ω(log2n) is a magic threshold point that separates two different "state-efficiency" regions. Our third and final result is to study the exact (instead of asymptotic) optimal tradeoffs for uniform algorithms. We propose a new routing algorithm that reduces the routing table size and the network diameter of Chord both by 21.4% without introducing any other protocol overhead, based on a novel number-theoretical technique.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jun Xu

Regret-minimizing representative databases

Data streaming algorithms for estimating entropy of network traffic

Efficient and scalable query routing for unstructured peer-to-peer networks

Space-code bloom filter for efficient per-flow traffic measurement

On the Fundamental Tradeoffs Between Routing Table Size and Network Diameter in Peer-to-Peer Networks

Contact Info

Product

Resources

About