The fusion of spectral-spatial features based on deep learning has become the focus of research in hyperspectral image (HSI) classification. However, previous deep frameworks based on spectral-spatial fusion usually performed feature aggregation only at the branch ends. Furthermore, only first-order statistical features are considered in the fusion process, which is not conducive to improving the discrimination of spectral-spatial features. This article proposes a global-local hierarchical weighted fusion endto-end classification architecture. The architecture includes two subnetworks for spectral classification and spatial classification. For the spectral subnetwork, two band-grouping strategies are designed, and bidirectional long short-term memory is used to capture spectral context information from global to local perspectives. For the spatial subnetwork, a pooling strategy based on local attention is combined to construct a global-local pooling fusion module to enhance the discriminability of spatial features learned by a convolutional neural network. For the fusion stage, a hierarchical weighting fusion mechanism is developed to obtain the nonlinear relationship between both spectral and spatial features. The experimental results on four real HSI datasets and a GF-5 satellite dataset demonstrate that the method proposed is more competitive in terms of accuracy and generalization.