Synthetic digital mammogram (SDM) is a 2D image generated from digital breast tomosynthesis (DBT) and used as a substitute for a full-field digital mammogram (FFDM) to reduce the radiation dose for breast cancer screening. The previous deep learning-based method used FFDM images as the ground truth, and trained a single neural network to directly generate SDM images with similar appearances (e.g., intensity distribution, textures) to the FFDM images. However, the FFDM image has a different texture pattern from DBT. The difference in texture pattern might make the training of the neural network unstable and result in high-intensity distortion, which makes it hard to decrease intensity distortion and increase perceptual similarity (e.g., generate similar textures) at the same time. Clinically, radiologists want to have a 2D synthesized image that feels like an FFDM image in vision and preserves local structures such as both mass and microcalcifications (MCs) in DBT because radiologists have been trained on reading FFDM images for a long time, while local structures are important for diagnosis. In this study, we proposed to use a deep convolutional neural network to learn the transformation to generate SDM from DBT. Method: To decrease intensity distortion and increase perceptual similarity, a multi-scale cascaded network (MSCN) is proposed to generate low-frequency structures (e.g., intensity distribution) and high-frequency structures (e.g., textures) separately. The MSCN consist of two cascaded sub-networks: the first sub-network is used to predict the low-frequency part of the FFDM image; the second sub-network is used to generate a full SDM image with textures similar to the FFDM image based on the prediction of the first sub-network. The meansquared error (MSE) objective function is used to train the first sub-network, termed low-frequency network, to generate a low-frequency SDM image. The gradient-guided generative adversarial network's objective function is to train the second sub-network, termed high-frequency network, to generate a full SDM image with textures similar to the FFDM image. Results: 1646 cases with FFDM and DBT were retrospectively collected from the Hologic Selenia system for training and validation dataset, and 145 cases with masses or MC clusters were independently collected from the Hologic Selenia system for testing dataset. For comparison, the baseline network has the same architecture as the high-frequency network and directly generates a full SDM image. Compared to the baseline method, the proposed MSCN improves the peak-to-noise ratio from 25.3 to 27.9 dB and improves the Gongfa Jiang, Zilong He, and Yuanpin Zhou contributed equally to this work.