Photoacoustic imaging potentially allows for the real-time visualization of functional human tissue parameters such as oxygenation but is subject to a challenging underlying quantification problem. While in silico studies have revealed the great potential of deep learning (DL) methodology in solving this problem, the inherent lack of an efficient gold standard method for model training and validation remains a grand challenge. This work investigates whether DL can be leveraged to accurately and efficiently simulate photon propagation in biological tissue, enabling photoacoustic image synthesis. Our approach is based on estimating the initial pressure distribution of the photoacoustic waves from the underlying optical properties using a back-propagatable neural network trained on synthetic data. In proof-of-concept studies, we validated the performance of two complementary neural network architectures, namely a conventional U-Net-like model and a Fourier Neural Operator (FNO) network. Our in silico validation on multispectral human forearm images shows that DL methods can speed up image generation by a factor of 100 when compared to Monte Carlo simulations with 5×108 photons. While the FNO is slightly more accurate than the U-Net, when compared to Monte Carlo simulations performed with a reduced number of photons (5×106), both neural network architectures achieve equivalent accuracy. In contrast to Monte Carlo simulations, the proposed DL models can be used as inherently differentiable surrogate models in the photoacoustic image synthesis pipeline, allowing for back-propagation of the synthesis error and gradient-based optimization over the entire pipeline. Due to their efficiency, they have the potential to enable large-scale training data generation that can expedite the clinical application of photoacoustic imaging.