In this paper, we propose a generalized clustering approach for static data flow subgraphs mapped onto individual processors in Multi-Processor System on Chips (MPSoCs). The goal of clustering is to replace the static data flow subgraph by a single dynamic data flow actor such that the global performance in terms of latency and throughput is optimized. Through our proposed clustering approach, the scheduling of connected static data flow subgraphs can be coordinated with enclosing system representations in a way that systematically exploits the predictability and efficiency of the static data flow model. Thus, the advantages of static data flow subsystems can be exploited in the context of overall system representations that are based on more general models of computation. At the same time, our approach goes significantly beyond previous approaches to synchronous data flow clustering by providing a quasi-static -as opposed to purely-static -scheduling interface between clustered subgraphs and the enclosing systems. This greatly enhances the power of our techniques in terms of avoiding deadlock, increasing the design space for clustering, and providing for integration with more general models of computation. We show benefits of up to 95% performance improvement for real world examples.