The monitoring of tree species diversity is important for forest or wetland ecosystem service maintenance or resource management. Remote sensing is an efficient alternative to traditional field work to map tree species diversity over large areas. Previous studies have used light detection and ranging (LiDAR) and imaging spectroscopy (hyperspectral or multispectral remote sensing) for species richness prediction. The recent development of very high spatial resolution (VHR) RGB images has enabled detailed characterization of canopies and forest structures. In this study, we developed a three-step workflow for mapping tree species diversity, the aim of which was to increase knowledge of tree species diversity assessment using deep learning in a tropical wetland (Haizhu Wetland) in South China based on VHR-RGB images and LiDAR points. Firstly, individual trees were detected based on a canopy height model (CHM, derived from LiDAR points) by the local-maxima-based method in the FUSION software (Version 3.70, Seattle, USA). Then, tree species at the individual tree level were identified via a patch-based image input method, which cropped the RGB images into small patches (the individually detected trees) based on the tree apexes detected. Three different deep learning methods (i.e., AlexNet, VGG16, and ResNet50) were modified to classify the tree species, as they can make good use of the spatial context information. Finally, four diversity indices, namely, the Margalef richness index, the Shannon–Wiener diversity index, the Simpson diversity index, and the Pielou evenness index, were calculated from the fixed subset with a size of 30 × 30 m for assessment. In the classification phase, VGG16 had the best performance, with an overall accuracy of 73.25% for 18 tree species. Based on the classification results, mapping of tree species diversity showed reasonable agreement with field survey data (R2Margalef = 0.4562, root-mean-square error RMSEMargalef = 0.5629; R2Shannon–Wiener = 0.7948, RMSEShannon–Wiener = 0.7202; R2Simpson = 0.7907, RMSESimpson = 0.1038; and R2Pielou = 0.5875, RMSEPielou = 0.3053). While challenges remain for individual tree detection and species classification, the deep-learning-based solution shows potential for mapping tree species diversity.