Clinically, retinal vessel segmentation is a significant step in the diagnosis of fundus diseases. However, recent methods generally neglect the difference of semantic information between deep and shallow features, which fail to capture the global and local characterizations in fundus images simultaneously, resulting in the limited segmentation performance for fine vessels. In this paper, a global transformer and dual local attention network via deep-shallow hierarchical feature fusion (GT-DLA-dsHFF) are investigated to solve the above limitations. First, the global transformer (GT) is developed to integrate the global information in the retinal image, which effectively captures the long-distance dependence between pixels, alleviating the discontinuity of blood vessels in the segmentation results. Second, the dual local attention (DLA), which is constructed using dilated convolutions with varied dilation rates, unsupervised edge detection, and squeeze-excitation block, is proposed to extract local vessel information, consolidating the edge details in the segmentation result. Finally, a novel deepshallow hierarchical feature fusion (dsHFF) algorithm is studied to fuse the features in different scales in the deep learning framework respectively, which can mitigate the attenuation of valid information in the process of feature fusion. We verified the GT-DLA-dsHFF on four typical fundus image datasets. The experimental results demonstrate our GT-DLA-dsHFF achieves superior performance against the current methods and detailed discussions verify the efficacy of the proposed three modules. Segmentation results on diseased images show the robustness of our proposed GT-DLA-dsHFF. Our codes will be available on https://github.com/YangLibuaa/GT-DLA-dsHFF.