To achieve the rapid recognition and accurate picking of Camellia oleifera fruits, a binocular vision system composed of two industrial cameras was used to collect images of Camellia oleifera fruits in natural environments. The YOLOv7 convolutional neural network model was used for iterative training, and the optimal weight model was selected to recognize the images and obtain the anchor frame region of the Camellia oleifera fruits. The local binary pattern (LBP) maps of the anchor frame region were extracted and matched by using the normalized correlation coefficient template matching algorithm to obtain the positions of the center point in the left and right images. The recognition experimental results showed that the accuracy rate, recall rate, mAP and F1 of the model were 97.3%, 97.6%, 97.7% and 97.4%. The recognition rate of the Camellia oleifera fruit with slight shading was 93.13%, and the recognition rate with severe shading was 75.21%. The recognition rate of the Camellia oleifera fruit was 90.64% under sunlight condition, and the recognition rate was 91.34% under shading condition. The orchard experiment results showed that, in the depth range of 400–600 mm, the maximum error value of the binocular stereo vision system in the depth direction was 4.279 mm, and the standard deviation was 1.142 mm. The detection and three-dimensional positioning accuracy of the binocular stereo vision system for Camellia oleifera fruits could basically meet the working requirements of the Camellia oleifera fruit-picking robot.