Tree peony is a deciduous shrub endemic to China and the Peony Seed Oil (PSO) is an important plant oil resource. However, at present, fruits harvesting of peony are mainly completed by manual work with low efficiency. In response to the need for a mechanized operation, a multisources vision system based on Time-Of-Flight (TOF) and RGB cameras was set up in this study. To achieve this, an RGB camera and a TOF camera were used to capture tree peony images over the same time period. A method based on Speeded-Up Robust Features (SURF) algorithm, nearest neighbor and Random Sample Consensus (RANSAC) algorithm was carried out to detect and match the feature points of grayscale images and intensity images. Then, the Normalized Direct Linear Transformation (NDLT) algorithm was used to achieve image registration of RGB images and depth images through the matched feature points. Based on Multi-Layer Perceptron (MLP) algorithm, by using the depth image and RGB image, the localization and maturity classification for peony fruits were achieved in this study. In our research, 90 groups of tree peony fruit images captured by this vision system were used to verify the feasibility of the algorithm. The result shows that in these images, 152 of 173 fruits were correctly recognized and the fruit recognition rate was 85.74%. The average of localization errors was 3.53, which is accuracy for harvesting operation. As for maturity classification, this system achieved a high recognition rate, 91.68% in total. The results show that the vision system achieved extracting location and color information of the fruit at the same time and it is not easy to be affected by environmental illumination and other factors. The proposed method can achieve high efficiency and high accuracy in terms of fruit localization and maturity classification.