While huge strides have recently been made in language-based machine learning, the ability of artificial systems to comprehend the sequences that comprise animal behavior has been lagging behind. In contrast, humans instinctively recognize behaviors by finding similarities in behavioral sequences. Here, we develop an unsupervised behavior-mapping framework, SUBTLE (spectrogram-UMAP-based temporal-link embedding), to capture comparable behavioral repertoires from 3D action skeletons. To find the best embedding method, we devise a temporal proximity index (TPI) as a new metric to gauge temporal representation in the behavioral embedding space. The method achieves the best TPI score compared to current embedding strategies. Its spectrogram-based UMAP clustering not only identifies subtle inter-group differences but also matches human-annotated labels. SUBTLE framework automates the tasks of both identifying behavioral repertoires like walking, grooming, standing, and rearing, and profiling individual behavior signatures like subtle inter-group differences by age. SUBTLE highlights the importance of temporal representation in the behavioral embedding space for human-like behavioral categorization.