The detection of rail surface defects is very important in railway transportation. However, the edge defects on both sides of the rail and the multi-scale variation between different types of defects both pose challenges to the detection of rail surface defects. In order to solve the above problems, this paper proposes a novel rail surface defect detection network, YOLOv5s-VF. First, we design a sharpening functional attention mechanism (V-CBAM) that contains two key components: adaptive channel attention (F-CAM) and sharpened spatial attention (SSA). In F-CAM, we use one-dimensional convolution with adaptive convolution kernels for cross-channel connections, which reduces the number of parameters of the attention mechanism without affecting its performance. In SSA, we design a sharpening filter suitable for spatial attention, which is used to enhance the attention to the edge position defects of railway tracks and enhance the detection effect of the network on edge defects. Second, we construct a microscale adaptive spatial feature fusion (M-ASFF), which adds a high-resolution feature extraction layer to enhance the details of the underlying features of tiny defects. At the same time, in order to prevent the loss of detailed information and the excessive increase of the parameters of the model, the low-resolution feature layer is removed. Combined with adaptive spatial feature fusion, it can prevent the semantic conflict caused by the fusion of features at different scales. Finally, given the lack of labeled public rail surface defect datasets, this paper is based on the collection of real rail images and manually labels defects to train an object detection network and open source it. The experimental results show that YOLOv5s-VF outperforms the existing rail surface defect detection methods with a detection accuracy of 93.5% and a detection speed of 114.9 fps.