In this study, we assess the capacity of the BERT (Bidirectional Encoder Representations from Transformers) framework to predict a 12-month risk for major diabetic complications-retinopathy, nephropathy, neuropathy, and major adverse cardiovascular events (MACE) using a single-center EHR dataset. We introduce a task-oriented predictive (Top)-BERT architecture, which is a unique end-to-end training and evaluation framework utilizing sequential input structure, embedding layer, and encoder stacks inherent to BERT. This enhanced architecture trains and evaluates the model across multiple learning tasks simultaneously, enhancing the model's ability to learn from a limited amount of data. Our findings demonstrate that this approach can outperform both traditional pretraining-finetuning BERT models and conventional machine learning methods, offering a promising tool for early identification of patients at risk of diabetes-related complications. We also investigate how different temporal embedding strategies affect the model's predictive capabilities, with simpler designs yielding better performance. The use of Integrated Gradients (IG) augments the explainability of our predictive models, yielding feature attributions that substantiate the clinical significance of this study. Finally, this study also highlights the essential role of proactive symptom assessment and the management of comorbid conditions in preventing the advancement of complications in patients with diabetes.