Individual cattle identification is a prerequisite and foundation for precision livestock farming. Existing methods for cattle identification require radio frequency or visual ear tags, all of which are prone to loss or damage. Here, we propose and implement a new unified deep learning approach to cattle identification using video analysis. The proposed deep learning framework is composed of a Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) with a self-attention mechanism. More specifically, the Inception-V3 CNN was used to extract features from a cattle video dataset taken in a feedlot with rear-view. Extracted features were then fed to a BiLSTM layer to capture spatio-temporal information. Then, self-attention was employed to provide a different focus on the features captured by BiLSTM for the final step of cattle identification. We used a total of 363 rear-view videos from 50 cattle at three different times with an interval of 1 month between data collection periods. The proposed method achieved 93.3% identification accuracy using a 30-frame video length, which outperformed current state-of-the-art methods (Inception-V3, MLP, SimpleRNN, LSTM, and BiLSTM). Furthermore, two different attention schemes, namely, additive and multiplicative attention mechanisms were compared. Our results show that the additive attention mechanism achieved 93.3% accuracy and 91.0% recall, greater than multiplicative attention mechanism with 90.7% accuracy and 87.0% recall. Video length also impacted accuracy, with video sequence length up to 30-frames enhancing identification performance. Overall, our approach can capture key spatio-temporal features to improve cattle identification accuracy, enabling automated cattle identification for precision livestock farming.