Sleep staging is the essential step in sleep quality assessment and sleep disorders diagnosis. However, most current automatic sleep staging approaches use recurrent neural networks (RNN), resulting in a relatively large training burden. Moreover, these methods only extract information of the whole epoch or adjacent epochs, ignoring the local signal variations within epoch. To address these issues, a novel deep learning architecture named segmented attention network (SAN) is proposed in this paper. The architecture can be divided into feature extraction (FE) and time sequence encoder (TSE). The FE module consists of multiple multiscale CNN (MMCNN) and residual squeeze and excitation block (SE block). The former extracts features from multiple equallength EEG segments and the latter reinforced the features. The TSE module based on a multi-head attention mechanism could capture the temporal information in the features extracted by FE module. Noteworthy, in SAN, we replaced the RNN module with a TSE module for temporal learning and made the network faster. The evaluation of the model was performed on two widely used public datasets, Montreal Archive of Sleep Studies (MASS) and Sleep-EDFX, and one clinical dataset from Huashan Hospital of Fudan University, Shanghai, China (HSFU). The proposed model achieved the accuracy of 85.5%, 86.4%, 82.5% on Sleep-EDFX, MASS and HSFU, respectively. The experimental results exhibited favorable performance and consistent improvements of SAN on different datasets in comparison with the state-of-the-art studies. It also proved the necessity of sleep staging by integrating the local characteristics within epochs and adjacent informative features among epochs.