Mingsheng Hong scite author profile

Abstract. Traditional content based publish/subscribe (pub/sub) systems allow users to express stateless subscriptions evaluated on individual events. However, many applications such as monitoring RSS streams, stock tickers, or management of RFID data streams require the ability to handle stateful subscriptions. In this paper, we introduce Cayuga, a stateful pub/sub system based on nondeterministic finite state automata (NFA). Cayuga allows users to express subscriptions that span multiple events, and it supports powerful language features such as parameterization and aggregation, which significantly extend the expressive power of standard pub/sub systems. Based on a set of formally defined language operators, the subscription language of Cayuga provides non-ambiguous subscription semantics as well as unique opportunities for optimizations. We experimentally demonstrate that common optimization techniques used in NFA-based systems such as state merging have only limited effectiveness, and we propose novel efficient indexing methods to speed up subscription processing. In a thorough experimental evaluation we show the efficacy of our approach.

show abstract

Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining

Wang

Hong

Pei

et al. 2004

View full text Add to dashboard Cite

Distributed event stream processing with non-deterministic finite automata

Brenna

Gehrke

Hong

et al. 2009

View full text Add to dashboard Cite

Transaction time indexing with version compression

et al. 2008

View full text Add to dashboard Cite

Immortal DB is a transaction time database system designed to enable high performance for temporal applications. It is built into a commercial database engine, Microsoft SQL Server. This paper describes how we integrated a temporal indexing technique, the TSB-tree, into Immortal DB to serve as the core access method. The TSB-tree provides high performance access and update for both current and historical data. A main challenge was integrating TSB-tree functionality while preserving original B+tree functionality, including concurrency control and recovery. We discuss the overall architecture, including our unique treatment of index terms, and practical issues such as uncommitted data and log management. Performance is a primary concern. To increase performance, versions are locally delta compressed, exploiting the commonality between adjacent versions of the same record. This technique is also applied to index terms in index pages. There is a tradeoff between query performance and storage space. We discuss optimizing performance regarding this tradeoff throughout the paper. The result of our efforts is a high-performance transaction time database system built into an RDBMS engine, which has not been achieved before. We include a thorough experimental study and analysis that confirms the very good performance that it achieves.

show abstract

Rule-based multi-query optimization

Hong

Riedewald

Koch

et al. 2009

View full text Add to dashboard Cite

Data stream management systems usually have to process many long-running queries that are active at the same time. Multiple queries can be evaluated more efficiently together than independently, because it is often possible to share state and computation. Motivated by this observation, various Multi-Query Optimization (MQO) techniques have been proposed. However, these approaches suffer from two limitations. First, they focus on very specialized workloads. Second, integrating MQO techniques for CQL-style stream engines and those for event pattern detection engines is even harder, as the processing models of these two types of stream engines are radically different.In this paper, we propose a rule-based MQO framework. This framework incorporates a set of new abstractions, extending their counterparts, physical operators, transformation rules, and streams, in a traditional RDBMS or stream processing system. Within this framework, we can integrate new and existing MQO techniques through the use of transformation rules. This allows us to build an expressive and scalable stream system. Just as relational optimizers are crucial for the success of RDBMSes, a powerful multi-query optimizer is needed for data stream processing. This work lays the foundation for such a multi-query optimizer, creating opportunities for future research. We experimentally demonstrate the efficacy of our approach.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mingsheng Hong

Towards Expressive Publish/Subscribe Systems

Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining

Distributed event stream processing with non-deterministic finite automata

Transaction time indexing with version compression

Rule-based multi-query optimization

Contact Info

Product

Resources

About