Embryo assessment and selection are usually based on the visual morphological analysis by expert embryologists. Although the embryologist assessment has been routinely used in clinical practice, it is highly dependent on the embryologist's experience and is very time‐consuming. Therefore, objective and efficient methods for automated embryo evaluation are in high demand. We proposed a framework of cascaded networks to hierarchically extract and integrate the microscopic image features for embryo classification. The cascaded networks consisted of a coarse network and a refined network. The coarse network produced a classification activation mapping (CAM) with the highest classification probability, which indicated the most discriminative regions of embryo classification. The refined network extracted and integrated the image features again by using both the CAMs and the corresponding original images. In addition, the residual external‐attention block (ResEA) was used in the refined network to better capture long‐range dependencies. Our cascaded networks were trained on a dataset of 7728 microscopic images of day 3 embryos from 1800 couples and evaluated on an independent testing dataset of 734 microscopic images. The accuracy, sensitivity, specificity, precision, and F1‐score were employed to evaluate the performance of our cascaded networks. Compared with the coarse network and the refined network, respectively, the cascaded networks without the ResEA improved the classification results of embryos. The ResEA block helped the cascaded networks to further improve all five metrics for better embryo classification. Our proposed cascaded networks also achieved better classification results than a junior embryologist did. The cascaded networks hierarchically make full use of image features for more effective learning, and the ResEA further improves the performance of embryo classification.