Solar flares are violent and sudden eruptions that occur in the solar atmosphere and release energy in the form of radiation. They can affect technological systems on Earth and in its orbit, causing financial losses and damage to human life. Therefore, it is necessary to predict the occurrence of such flares to mitigate their effects. Specialized instruments gather data for solar activity monitoring. Hence, we can create prediction models using machine learning from this data. From an analysis of the literature, we noticed the prevalence of some algorithms, such as Multi-layer Perceptrons (MLP), Support Vector Machines (SVM), and Long Short-Term Memory (LSTM), which presented good results, mainly considering the True Skill Statistic (TSS) metric. In parallel, in 2017, a new deep learning-based neural network architecture called Transformers emerged. Researchers initially created it for natural language processing. However, Transformers were successfully employed in other domains, such as time series forecasting. Solar activity data is considered a time series due to its continuous capture over time. Consequently, we can employ Transformers to develop a solar flare forecast model. Considering a significant lack of work using Transformers for solar flare forecasting, we ran experiments to test the Transformers' viability and performance in solar flare forecast models. We created models using other algorithms (MLP, SVM, LSTM, Transformers) to investigate the Transformers' performance and compared them using accuracy, TSS, and Area Under the ROC Curve (AUC) metrics. We observed that the Transformers had superior performance compared to the other models. For instance, the Transformers' TSS metric average was 0.9, contrasting the other models' TSS=0.4. The difference was slightly smaller in AUC, where Transformers reached 0.9, and the others reached no more than 0.7. Therefore, we can use the Transformers to classify solar flare data and obtain superior results compared to other models. We also conducted experiments using different forms of data balancing, including unbalanced data, balanced with undersampling, oversampling, and SMOTE techniques. The MLP, SVM, and LSTM models showed significant improvements in balance, where the average TSS increased from 0.1 to 0.4. On the other hand, Transformers were not sensitive to data balancing, presenting the most stable TSS in all cases.