Wildlife recognition is of utmost importance for monitoring and preserving biodiversity. In recent years, deep-learning-based methods for wildlife image recognition have exhibited remarkable performance on specific datasets and are becoming a mainstream research direction. However, wildlife image recognition tasks face the challenge of weak generalization in open environments. In this paper, a Deep Joint Adaptation Network (DJAN) for wildlife image recognition is proposed to deal with the above issue by taking a transfer learning paradigm into consideration. To alleviate the distribution discrepancy between the known dataset and the target task dataset while enhancing the transferability of the model’s generated features, we introduce a correlation alignment constraint and a strategy of conditional adversarial training, which enhance the capability of individual domain adaptation modules. In addition, a transformer unit is utilized to capture the long-range relationships between the local and global feature representations, which facilitates better understanding of the overall structure and relationships within the image. The proposed approach is evaluated on a wildlife dataset; a series of experimental results testify that the DJAN model yields state-of-the-art results, and, compared to the best results obtained by the baseline methods, the average accuracy of identifying the eleven wildlife species improves by 3.6 percentage points.