Online Top-k-Position Monitoring of Distributed Data Streams

Mäcker, Alexander; Malatyali, Manuel; Heide, Friedhelm Meyer auf der

doi:10.1109/ipdps.2015.40

Cited by 7 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A message will usually, besides a constant number of control bits, consist of a data item, a node ID and an identifier to distinguish between messages of different instances of an algorithm applied in parallel (as done when using standard probability amplification techniques). A broadcast channel is an extension to [2], which was originally proposed in [13] and afterwards applied in [4,5,17]. Between any two time steps we allow a communication protocol to take place, which may use a polylogarithmic number of rounds.…”

Section: Modelmentioning

confidence: 99%

A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

Biermeier

Feldkord

Malatyali

et al. 2018

Approximation and Online Algorithms

Self Cite

View full text Add to dashboard Cite

We consider the scenario of n sensor nodes observing streams of data. The nodes are connected to a central server whose task it is to compute some function over all data items observed by the nodes. In our case, there exists a total order on the data items observed by the nodes. Our goal is to compute the k currently lowest observed values or a value with rank in [(1 − ε)k, (1 + ε)k] with probability (1 − δ). We propose solutions for these problems in an extension of the distributed monitoring model where the server can send broadcast messages to all nodes for unit cost. We want to minimize communication over multiple time steps where there are m updates to a node's value in between queries. The result is composed of two main parts, which each may be of independent interest: * This work was partially supported by the German Research Foundation (DFG) within the Priority Program "Algorithms for Big Data" (SPP 1736).Consider a distributed sensor network which is a system consisting of a huge amount of nodes. Each node continuously observes its environment and measures information (e.g. temperature, pollution or similar parameters). We are interested in aggregations describing the current observations at a central server.To keep the server's information up to date, the server and the nodes can communicate with each other. In sensor networks, however, the amount of such communication is particularly crucial, as communication has the largest impact to energy consumption, which is limited due to battery capacities [11]. Therefore, algorithms aim at minimizing the (total) communication required for computing the respective aggregation function at the server.We consider several ideas to potentially lower the communication used. Each single computation of an aggregate should use as little communication as possible. Computations of the same aggregate should reuse parts of previous computations. Only compute aggregates, if necessary. Recall that the continuous monitoring model creates a new output as often as possible.

show abstract

Section: Modelmentioning

confidence: 99%

A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

Biermeier

Feldkord

Malatyali

et al. 2018

Approximation and Online Algorithms

Self Cite

View full text Add to dashboard Cite

show abstract

“…, ∆}, a node ID and an identifier to distinguish between messages of different instances of an algorithm applied in parallel (as done when using standard probability amplification techniques). Having a broadcast channel is an extension to [1], which was originally proposed in [2] and afterwards applied in [7,8]. For ease of presentation, we assume that not only the server can send broadcast messages, but also the nodes.…”

Section: Model and Problemsmentioning

confidence: 99%

Monitoring of Domain-Related Problems in Distributed Data Streams

Bemmann

Biermeier

Bürmann

et al. 2017

Structural Information and Communication Complexity

Self Cite

View full text Add to dashboard Cite

Consider a network in which n distributed nodes are connected to a single server. Each node continuously observes a data stream consisting of one value per discrete time step. The server has to continuously monitor a given parameter defined over all information available at the distributed nodes. That is, in any time step t, it has to compute an output based on all values currently observed across all streams. To do so, nodes can send messages to the server and the server can broadcast messages to the nodes. The objective is the minimisation of communication while allowing the server to compute the desired output.We consider monitoring problems related to the domain Dt defined to be the set of values observed by at least one node at time t. We provide randomised algorithms for monitoring Dt, (approximations of) the size |Dt| and the frequencies of all members of Dt. Besides worst-case bounds, we also obtain improved results when inputs are parameterised according to the similarity of observations between consecutive time steps. This parameterisation allows to exclude inputs with rapid and heavy changes, which usually lead to the worst-case bounds but might be rather artificial in certain scenarios.

show abstract

“…Due to huge volumes and velocity, data can neither be completely stored, nor sent to a central server via a network, nor fully processed in real time. Initial results concern, among others, the communication complexity of socalled distributed aggregation problems; Mäcker et al [20] considered the expected message complexity for the top-k Position Monitoring problem. Here, the task is to compute the IDs of the devices that observe the k largest items at every time step.…”

Section: P3 Distributed Data Streams In Dynamic Environments F Meyermentioning

confidence: 99%

DFG Priority Programme SPP 1736: Algorithms for Big Data

Behdju

Meyer

2017

Künstl Intell

View full text Add to dashboard Cite

Volume, Velocity, and Variety are the three Vs commonly used to define the term big data. Simply put, those refer to the increasing amount of new data created, the increasing rate at which it is created, and the increasing number of different formats it has. At the same time, the three Vs describe challenges that require new algorithmic approaches. In order to tackle those challenges, the German Research Foundation established in 2013 the priority programme SPP 1736: Algorithms for Big Data. In this article we give a short overview on the research topics represented within this priority programme.

show abstract

Online Top-k-Position Monitoring of Distributed Data Streams

Cited by 7 publications

References 16 publications

A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

Monitoring of Domain-Related Problems in Distributed Data Streams

DFG Priority Programme SPP 1736: Algorithms for Big Data

Contact Info

Product

Resources

About