Accurate identification of high-frequency oscillation (HFO) is an important prerequisite for precise localization of epileptic foci and good prognosis of drug-refractory epilepsy. Exploring a high-performance automatic detection method for HFOs can effectively help clinicians reduce the error rate and reduce manpower. Due to the limited analysis perspective and simple model design, it is difficult to meet the requirements of clinical application by the existing methods. Therefore, an end-to-end bi-branch fusion model is proposed to automatically detect HFOs. With the filtered band-pass signal (signal branch) and time-frequency image (TFpic branch) as the input of the model, two backbone networks for deep feature extraction are established, respectively. Specifically, a hybrid model based on ResNet1d and long short-term memory (LSTM) is designed for signal branch, which can focus on both the features in time and space dimension, while a ResNet2d with a Convolutional Block Attention Module (CBAM) is constructed for TFpic branch, by which more attention is paid to useful information of TF images. Then the outputs of two branches are fused to realize end-to-end automatic identification of HFOs. Our method is verified on 5 patients with intractable epilepsy. In intravalidation, the proposed method obtained high sensitivity of 94.62%, specificity of 92.7%, and F1-score of 93.33%, and in cross-validation, our method achieved high sensitivity of 92.00%, specificity of 88.26%, and F1-score of 89.11% on average. The results show that the proposed method outperforms the existing detection paradigms of either single signal or single time-frequency diagram strategy. In addition, the average kappa coefficient of visual analysis and automatic detection results is 0.795. The method shows strong generalization ability and high degree of consistency with the gold standard meanwhile. Therefore, it has great potential to be a clinical assistant tool.