Ultrasound b-mode imaging is a qualitative approach and diagnostic quality strongly depends on operators’ training and experience. Quantitative approaches can provide information about tissue properties; therefore, can be used for identifying various tissue types, e.g., speed-of-sound in the tissue can be used as a biomarker for tissue malignancy, especially in breast imaging. Recent studies showed the possibility of speed-of-sound reconstruction using deep neural networks that are fully trained on simulated data. However, because of the ever-present domain shift between simulated and measured data, the stability and performance of these models in real setups are still under debate. In prior works, for training data generation, tissue structures were modeled as simplified geometrical structures which does not reflect the complexity of the real tissues. In this study, we proposed a new simulation setup for training data generation based on Tomosynthesis images. We combined our approach with the simplified geometrical model and investigated the impacts of training data diversity on the stability and robustness of an existing network architecture. We studied the sensitivity of the trained network to different simulation parameters, e.g., echogenicity, number of scatterers, noise, and geometry. We showed that the network trained with the joint set of data is more stable on out-of-domain simulated data as well as measured phantom data.