Social behavioral biometrics investigates social interactions to determine a person's identity. Within the discipline of social behavioral biometrics, recognition of individuals based on their aesthetic preferences is an emerging direction of research. Human aesthetic is a soft, behavioral biometric trait that refers to a person's attitudes towards a particular subject material. Recent developments in aesthetic-based biometric systems have proven that an individual's visual and audio aesthetic preferences hold considerable distinctive features. This paper introduces a novel three-stage audio-aesthetic system that can uniquely identify a user from the set of their favorite songs. The system utilizes Residual Network (ResNet) for highlevel feature extraction. A hybrid meta-heuristic feature selection algorithm based on Cuckoo Search and Whale Optimization is proposed for feature extraction optimization, which results in the low-dimensional feature set. The selected subset of features is fed into the XGBoost classifier to establish a person's identity. The proposed method outperformed the handcrafted feature-based method by achieving 99.54% accuracy on a proprietary dataset (Free Music Archive) and 99.79% accuracy on a publicly available dataset (Million Playlists Dataset).INDEX TERMS Social behavioral biometrics, deep learning, biometric authentication, audio aesthetics, transfer learning, meta-heuristic, feature selection.