As an indispensable task for traffic management department, road maintenance has attracted much attention during the last decade due to the rapid development of traffic network. As is known, crack is the early form of many road damages, and repair it in time can significantly save the maintenance cost. In this case, how to detect crack regions quickly and accurately becomes a huge demand. Actually, many image processing technique based methods have been proposed for crack detection, but their performances can not meet our expectations. The reason is that, most of these methods use bottom features such as color and texture to detect the cracks, which are easily influenced by the varied conditions such as light and shadow. Inspired by the great successes of machine learning and artificial intelligence, this paper presents a sample and structure guided network for detecting road cracks. Specifically, the proposed network is based on U-Net architecture, which remains the details from input to output by using skip connection strategy. Then, because the scale of crack samples is much smaller than that of non-crack ones, directly using the conventional cross entropy loss can not optimize the network effectively. In this case, the Focal loss is utilized to address the model optimization problem. Additionally, we incorporate the self-attention strategy into the proposed network, which enhances its stability by encoding the 2-order information among different local regions into the final features. Finally, we test the proposed method on four datasets, three public ones with labels and a photographed one without labels, to validate its effectiveness. It is noteworthy that, for the photographed dataset, we design a series of image processing strategies such as contrast enhancement to improve the generalization capability of the proposed method.