To optimize real-time stream-processing applications for chip-level multi processors, several challenges have to be met. Poor scalability and poor internal data pressure may result from serial dependencies within or between the algorithms. Load imbalances introduced by the parallel-processing hardware and execution environment may also limit performance. To maximize the throughput and minimize the latency of parallel stream-processing applications, we propose an approach that complements run-time dynamic load balancing with static pre-compile partitioning. In our solution, the dynamic features are based on event-driven scheduling, while the static features benefit from profile-guided automatic optimizations. In this paper, we present some recent enhancements of DSPE, an opensource development environment, featuring model and source code generators for prototyping, refining and customizing realtime stream-processing applications. By using our approach on micro-benchmarks and sample applications, we also show that it is possible to reduce the impact of the different speed-up constrainers.