Dataflow models are efficient programming paradigms for expressing the parallelism of an application. Dataflow-based resource allocation methods on multicore architectures usually rely on complex graph transformations to explicit the application parallelism which can result in complex graphs for embarrassingly parallel applications. This paper presents an automated method that efficiently manages pre-scheduling graph complexity, pipelines sequential parts, and optimally adapts the dataflow model to the target architecture, striking a superior balance between application complexity and performance than existing methods. Our method surpasses state-of-the-art techniques, achieving up to 1.8 times higher throughputs in experiments. It also significantly reduces analysis time to seconds compared to the original PREESM method, which could take several days for fine-grained applications.