The ear’s relatively stable structure makes it suitable for recognition. In common identification applications, only one sample per person (OSPP) is registered in a gallery; consequently, effectively training deep-learning-based ear recognition approach is difficult. The state-of-the-art (SOA) 3D ear recognition using the OSPP approach bottlenecks when large occluding objects are close to the ear. Hence, we propose a system that combines PointNet++ and three layers of features that are capable of extracting rich identification information from a 3D ear. Our goal is to correctly recognize a 3D ear affected by a large nearby occlusion using one sample per person (OSPP) registered in a gallery. The system comprises four primary components: (1) segmentation; (2) local and local joint structural (LJS) feature extraction; (3) holistic feature extraction; and (4) fusion. We use PointNet++ for ear segmentation. For local and LJS feature extraction, we propose an LJS feature descriptor–pairwise surface patch cropped using a symmetrical hemisphere cut-structured histogram with an indexed shape (PSPHIS) descriptor. Furthermore, we propose a local and LJS matching engine based on the proposed LJS feature descriptor and SOA surface patch histogram indexed shape (SPHIS) local feature descriptor. For holistic feature extraction, we use a voxelization method for global matching. For the fusion component, we use a weighted fusion method to recognize the 3D ear. The experimental results demonstrate that the proposed system outperforms the SOA normalization-free 3D ear recognition methods using OSPP when the ear surface is influenced by a large nearby occlusion.