Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams

Koch, Christoph; Scherzinger, Stefanie; Schweikardt, Nicole; Stegmaier, Bernhard

doi:10.1016/b978-012088469-8/50023-1

Cited by 15 publications

(19 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work also has a number of similarities to the area of query processing over XML streams (e.g., see [36][37][38][39][40][41][42]). Most of these approaches consider optimizations for specific XML query languages or language fragments, sometimes taking into account additional aspects of streaming data.…”

Section: Related Workmentioning

confidence: 99%

Parallelizing XML data-streaming workflows via MapReduce

Zinn¹,

Bowers²,

Köhler³

et al. 2010

Journal of Computer and System Sciences

View full text Add to dashboard Cite

In prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that receive XML-structured data and produce, often through calls to "black-box" (scientific) functions, modified (i.e., updated) XML structures. Our main contributions are (i) the development of a set of strategies for compiling scientific workflows, modeled as XML processing pipelines, into parallel MapReduce networks, and (ii) a discussion of their advantages and tradeoffs, based on a thorough experimental evaluation of the various translation strategies. Our evaluation uses the Hadoop MapReduce system as an implementation platform. Our results show that execution times of XML workflow pipelines can be significantly reduced using our compilation strategies. These efficiency gains, together with the benefits of MapReduce (e.g., fault tolerance) make our approach ideal for executing large-scale, compute-intensive XML-based scientific workflows.

show abstract

Section: Related Workmentioning

confidence: 99%

Parallelizing XML data-streaming workflows via MapReduce

Zinn¹,

Bowers²,

Köhler³

et al. 2010

Journal of Computer and System Sciences

View full text Add to dashboard Cite

show abstract

“…Examples for those systems are QStream [31], Demaq [32] and Borealis [33]. Typically, those systems use a one-thread-per-operator model (which is comparable to full vectorization) or a single-threaded operator scheduling with central control strategies (assuming high costs for switching the process context) [34][35][36]. There are interesting approaches [17,[37][38][39], where operators are distributed across a number of threads in a query-aware manner.…”

Section: Related Workmentioning

confidence: 99%

Cost-based vectorization of instance-based integration processes

Böehm

Habich

Preissler

et al. 2011

Information Systems

View full text Add to dashboard Cite

“…In [73] queries are specified in terms of attributed grammars and evaluated by deterministic depth-synchronous pushdown document automata. A stream-oriented language close to XQuery, FluX, is defined in [74]. It is shown how evaluation can be optimized in the presence of schema knowledge.…”

Section: Queries On Streaming Xmlmentioning

confidence: 99%