In genomics, a wide range of machine learning methods is used to annotate biological sequences w.r.t. interesting positions such as transcription start sites, translation initiation sites, methylation sites, splice sites, promotor start sites, etc. In recent years, this area has been dominated by convolutional neural networks, which typically outperform older methods as a result of automated scanning for influential sequence motifs. As an alternative, we introduce in this paper transformer architectures for whole-genome sequence labeling tasks. We show that those architectures, which have been recently introduced for natural language processing, allow for a fast processing of long DNA sequences. We optimize existing networks and define a new way to calculate attention, resulting in state-of-the-art performances. To demonstrate this, we evaluate our transformer model architecture on several sequence labeling tasks, and find it to outperform specialized models for the annotation of transcription start sites, translation initiation sites and 4mC methylation in E. coli. In addition, the use of the full genome for model training and evaluation results in unbiased performance metrics, facilitating future benchmarking.