With the development of sensor technology and point cloud generation techniques, there has been an increasing amount of high-quality forest RGB point cloud data. However, popular clustering-based point cloud segmentation methods are usually only suitable for pure forest scenes and not ideal for scenes with multiple ground features or complex terrain. Therefore, this study proposes a single-tree point cloud extraction method that combines deep semantic segmentation and clustering. This method first uses a deep semantic segmentation network, Improved-RandLA-Net, which is developed based on RandLA-Net, to extract point clouds of specified tree species by adding an attention chain to improve the model’s ability to extract channel and spatial features. Subsequently, clustering is employed to extract single-tree point clouds from the segmented point clouds. The feasibility of the proposed method was verified in the Gingko site, the Lin’an Pecan site, and a Fraxinus excelsior site in a conference center. Finally, semantic segmentation was performed on three sample areas using pre- and postimproved RandLA-Net. The experiments demonstrate that Improved-RandLA-Net had significant improvements in Accuracy, Precision, Recall, and F1 score. At the same time, based on the semantic segmentation results of Improved-RandLA-Net, single-tree point clouds of three sample areas were extracted, and the final single-tree recognition rates for each sample area were 89.80%, 75.00%, and 95.39%, respectively. The results demonstrate that our proposed method can effectively extract single-tree point clouds in complex scenes.