Semantic change detection (SCD) is a newly important topic in the field of remote sensing (RS) image interpretation since it provides semantic comprehension for bi-temporal RS images via predicting change regions and change types and has great significance for urban planning and ecological monitoring. With the availability of large scale bi-temporal RS datasets, various models based on deep learning (DL) have been widely applied in SCD. Since convolution operators in DL extracts two-dimensional feature matrices in the spatial dimension of images and stack feature matrices in the dimension termed the channel, feature maps of images are tri-dimensional. However, recent SCD models usually overlook the stereoscopic property of feature maps. Firstly, recent SCD models are usually limited in capturing spatial global features in the process of bi-temporal global feature extraction and overlook the global channel features. Meanwhile, recent SCD models only focus on spatial cross-temporal interaction in the process of change feature perception and ignore the channel interaction. Thus, to address above two challenges, a novel fine-grained feature perception network (FFPNet) is proposed in this paper, which employs the Omni Transformer (OiT) module to capture bi-temporal channel–spatial global features before utilizing the Omni Cross-Perception (OCP) module to achieve channel–spatial interaction between cross-temporal features. According to the experiments on the SECOND dataset and the LandsatSCD dataset, our FFPNet reaches competitive performance on both countryside and urban scenes compared with recent typical SCD models.