Publish/subscribe systems have demonstrated the ability to scale to large numbers of users and high data rates when providing content-based data dissemination services on the Internet. However, their services are limited by the data semantics and query expressiveness that they support. On the other hand, the recent work on selective dissemination of XML data has made significant progress in moving from XML filtering to the richer functionality of transformation for result customization, but in general has ignored the challenges of deploying such XML-based services on an Internet-scale. In this paper, we address these challenges in the context of incorporating the rich functionality of XML data dissemination in a highly scalable system. We present the architectural design of ONYX, a system based on an overlay network. We identify the salient technical challenges in supporting XML filtering and transformation in this environment and propose techniques for solving them.
IntroductionA large number of emerging applications, such as mobile services, stock tickers, sports tickers, personalized newspaper generation, network monitoring, traffic monitoring, and electronic auctions, has fuelled an increasing interest in ContentBased Data Dissemination (CBDD). CBDD is a service that delivers information to users (equivalently, applications or organizations) based on the correspondence between the content of the information and the user data interests. Figure 1 shows the context in which a data dissemination system providing this service operates. Users subscribe to the service by providing profiles expressing their data interests. Data sources publish their data by pushing messages to the system. The system delivers to each user the messages that match her data interests; these messages are presented in the format required by the user.Over the past few years, XML has rapidly gained popularity as the standard for data exchange in enterprise intranets and on the Internet. The ability to augment data with semantic and structural information using XML-based encoding raises the potential for more accurate and useful delivery of data. In the context of XML-based data dissemination, user profiles can involve constraints over both the structure and value of XML fragments, resulting in potentially more precise filtering of XML messages. In many emerging applications, the relevant XML messages also need to be transformed for data and application integration, personalization, and adaptation to wireless devices.While . Integrating XML processing into such distributed environments appears to be a natural approach to supporting large-scale XML dissemination.
ChallengesDistributed pub/sub systems partition the profile population to multiple nodes and direct the message flow to the nodes hosting profiles based on the content of messages (referred to as content-driven routing). Integrating XML into contentdriven routing, however, brings the following key challenges. As XML mixes structural and value-based information, content-drive...