The precise detection and positioning of tea buds are among the major issues in tea picking automation. In this study, a novel algorithm for detecting tea buds and estimating their poses in a field environment was proposed by using a depth camera. This algorithm introduces some improvements to the YOLOv5l architecture. A Coordinate Attention Mechanism (CAM) was inserted into the neck part to accurately position the elements of interest, a BiFPN was used to enhance the small object detection ability, and a GhostConv module replaced the original Conv module in the backbone to reduce the model size and speed up model inference. After testing, the proposed detection model achieved an mAP of 85.2%, a speed of 87.71 FPS, a parameter number of 29.25 M, and a FLOPs value of 59.8 G, which are all better than those achieved with the original model. Next, an optimal pose-vertices search method (OPVSM) was developed to estimate the pose of tea by constructing a graph model to fit the pointcloud. This method could accurately estimate the poses of tea buds, with an overall accuracy of 90%, and it was more flexible and adaptive to the variations in tea buds in terms of size, color, and shape features. Additionally, the experiments demonstrated that the OPVSM could correctly establish the pose of tea buds through pointcloud downsampling by using voxel filtering with a 2 mm × 2 mm × 1 mm grid, and this process could effectively reduce the size of the pointcloud to smaller than 800 to ensure that the algorithm could be run within 0.2 s. The results demonstrate the effectiveness of the proposed algorithm for tea bud detection and pose estimation in a field setting. Furthermore, the proposed algorithm has the potential to be used in tea picking robots and also can be extended to other crops and objects, making it a valuable tool for precision agriculture and robotic applications.