Artificial intelligence has demonstrated its ability to solve lots of critical tasks, but at the cost of high computational requirements. Different hardware has been proposed to provide this computational power, each one with its benefits and drawbacks. However, the exploration of the different alternatives in an easy an integrated way is still a complex task. To solve so, this paper proposes a UML-based design flow where neural networks are initially specified and then automatically generated and trained using TensorFlow. The approach also enables automatic mapping of models to CPU, GPU and FPGAs, using Xilinx's Deep Learning Processor Units (DPUs). The framework also generates the communication codes required to connect the other system components with the implementation selected. This approach addresses design-space exploration challenges, system architecture definition, and improves implementation and training processes by saving time and effort.