Through wearable sensors and deep learning techniques, biomechanical analysis can reach beyond the lab for clinical and sporting applications. Transformers, a class of recent deep learning models, have become widely used in state-of-the-art artificial intelligence research due to their superior performance in various natural language processing and computer vision tasks. The performance of transformer models has not yet been investigated in biomechanics applications. In this study, we introduce a Biomechanical Multi-activity Transformer-based model, BioMAT, for the estimation of joint kinematics from streaming signals of multiple inertia measurement units (IMUs) using a publicly available dataset. This dataset includes IMU signals and the corresponding sagittal plane kinematics of the hip, knee, and ankle joints during multiple activities of daily living. We evaluated the model’s performance and generalizability and compared it against a convolutional neural network long short-term model, a bidirectional long short-term model, and multi-linear regression across different ambulation tasks including level ground walking (LW), ramp ascent (RA), ramp descent (RD), stair ascent (SA), and stair descent (SD). To investigate the effect of different activity datasets on prediction accuracy, we compared the performance of a universal model trained on all activities against task-specific models trained on individual tasks. When the models were tested on three unseen subjects’ data, BioMAT outperformed the benchmark models with an average root mean square error (RMSE) of 5.5 ± 0.5°, and normalized RMSE of 6.8 ± 0.3° across all three joints and all activities. A unified BioMAT model demonstrated superior performance compared to individual task-specific models across four of five activities. The RMSE values from the universal model for LW, RA, RD, SA, and SD activities were 5.0 ± 1.5°, 6.2 ± 1.1°, 5.8 ± 1.1°, 5.3 ± 1.6°, and 5.2 ± 0.7° while these values for task-specific models were, 5.3 ± 2.1°, 6.7 ± 2.0°, 6.9 ± 2.2°, 4.9 ± 1.4°, and 5.6 ± 1.3°, respectively. Overall, BioMAT accurately estimated joint kinematics relative to previous machine learning algorithms across different activities directly from the sequence of IMUs signals instead of time-normalized gait cycle data.