Factory layout planning aims at finding an optimized layout configuration under consideration of varying influences such as the material flow characteristics. Manual layout planning can be characterized as a complex decision-making process due to a large number of possible placement options. Automated planning approaches aim at reducing the manual planning effort by generating optimized layout variants in the early stages of layout planning. Recent developments have introduced deep Reinforcement Learning (RL) based planning approaches that allow to optimize a layout under consideration of a single optimization criterion. However, within layout planning, multiple partially conflicting planning objectives have to be considered. Such multiple objectives are not considered by existing RL-based approaches. This paper addresses this research gap by presenting a novel deep RL-based layout planning approach that allows consideration of multiple objectives for optimization. Furthermore, existing RL-based planning approaches only consider analytically formulated objectives such as the transportation distance. Consequently, dynamic influences in the material flow are neglected which can result in higher operational costs of the future factory. To address this issue, a discrete event simulation module is developed that allows simulating manufacturing and material flow processes simultaneously for any layout configuration generated by the RL approach. Consequently, the presented approach considers material flow simulation results for multi-objective optimization. To investigate the capabilities of RL-based factory layout planning, different RL architectures are compared based on a simplified application scenario. Throughput time, media supply, and material flow clarity are considered as optimization objectives. The best performing architecture is then applied to an exemplary application scenario and compared with the results obtained by a combined version of the genetic algorithm and tabu search, the non-dominated sorting genetic algorithm, and the optimal solution. Finally, an industrial planning scenario with 43 functional units is considered. The results show that the performance of RL compared to meta-heuristics depends on the considered computation time. It is found that meta-heuristics lead to superior results in the early computation phase. However, with time, RL achieves comparable results for throughput time and better results for material flow clarity. Finally, the potential of applying transfer learning is investigated for three different application scenarios. It is observed that RL can learn generalized patterns for factory layout planning, which allows to significantly reduce the required training time and can lead to improved solution quality.