Distributed Stream Processing (DSP) systems highly rely on parallelism mechanisms to deliver high performance in terms of latency and throughput. Yet the development of such parallel systems altogether comes with numerous challenges. In this paper, we focus on how to select appropriate resources for parallel stream processing under the presence of highly dynamic and unseen workloads. We present PANDA that provides a novel learned approach for highly efficient and parallel DSP systems. The main idea is to provide accurate resource estimates and hence optimal parallelism degree using zero-shot cost models to ensure the performance demands.
CCS CONCEPTS• Computer systems organization → Real-time systems.