The exploitation of the rapid growing field of Multiprocessor on chip requires efficient design methodologies and tools. Besides the extraction of task level parallelism, architecture synthesis is of growing importance, in particular for programmable hardware devices, which allow for the reuse of the hardware across applications. Using programmable devices to implement on-chip multiprocessor systems, the architecture of the computing and communication infrastructure can be tailored to match the inherent parallelism of applications. In this paper, we present a design flow for the generation of the computing and communication architecture for a given application. The automatic extraction of the parallelism is presented first, followed by the application-driven synthesis of the computing and communication architecture. Finally, a light weight on-chip communication library as well as a tool-chain for the design of on-chip multiprocessor on programmable devices is explained. Each step of the design process is evaluated with different examples.