Oil drilling has always been considered a vital part of resource exploitation, and during which overflow is the most common and tricky threat that may cause blowout, a catastrophic accident. Therefore, to prevent further damage, it is necessary to detect overflow as early as possible. However, due to the unbalanced distribution and the lack of labeled data, it is difficult to design a suitable solution. To address this issue, an improved Transformer Framework based on self-supervised learning is proposed in this paper, which can accurately detect overflow 20 min in advance when the labeled data are limited and severely imbalanced. The framework includes a self-supervised pre-training scheme, which focuses on long-term time dependence that offers performance benefits over fully supervised learning on downstream tasks and makes unlabeled data useful in the training process. Next, to better extract temporal features and adapt to multi-task training process, a Transformer-based auto-encoder with temporal convolution layer is proposed. In the experiment, we used 20 min data to detect overflow in the next 20 min. The results show that the proposed framework can reach 98.23% accuracy and 0.84 F1 score, which is much better than other methods. We also compare several modifications of our framework and different pre-training tasks in the ablation experiment to prove the advantage of our methods. Finally, we also discuss the influence of important hyperparameters on efficiency and accuracy in the experiment.