Visual object tracking technology is one of the key issues in computer vision. In this paper, we propose a visual object tracking algorithm based on cross-modality featuredeep learning using Gaussian-Bernoulli deep Boltzmann machines (DBM) with RGB-D sensors. First, a cross-modality featurelearning network based on aGaussian-Bernoulli DBM is constructed, which can extract cross-modality features of the samples in RGB-D video data. Second, the cross-modality features of the samples are input into the logistic regression classifier, andthe observation likelihood model is established according to the confidence score of the classifier. Finally, the object tracking results over RGB-D data are obtained using aBayesian maximum a posteriori (MAP) probability estimation algorithm. The experimental results show that the proposed method has strong robustness to abnormal changes (e.g., occlusion, rotation, illumination change, etc.). The algorithm can steadily track multiple targets and has higher accuracy.