Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems designed according to the concept of stream processing. A common area of application is processing continuous data streams from sensors, for example, IoT devices or performance monitoring tools. In addition to analyzing pure sensor data, analyses of data for entire groups of sensors often need to be performed. Therefore, data streams of the individual sensors have to be continuously aggregated to a data stream for a group. Motivated by a real-world application scenario of analyzing power consumption in Industry 4.0 environments, we propose that such a stream aggregation approach has to allow for aggregating sensors in hierarchical groups, support multiple such hierarchies in parallel, provide reconfiguration at runtime, and preserve the scalability and reliability qualities of stream processing techniques. We propose a stream processing architecture fulfilling these requirements, which can be integrated into existing big data architectures. As all state-of-the-art stream processing frameworks have to handle a trade-off between latency, resource-efficiency, and correctness, our proposed architecture can be configured for low latency and resource-efficient computation or for always ensuring correct results. To assist adopters in choosing appropriate configuration options, we provide an experimental comparison. We present a pilot implementation of our proposed architecture and show how it is used in industry. Furthermore, in experimental evaluations we show that our solution scales linearly with the amount of sensors and provides adequate reliability in the presence of faults.