Social robots facilitate improved human–robot interactions through nonverbal behaviors such as handshakes or hugs. However, the traditional methods, which rely on precoded motions, are predictable and can detract from the perception of robots as interactive agents. To address this issue, we have introduced a Seq2Seq-based neural network model that learns social behaviors from human–human interactions in an end-to-end manner. To mitigate the risk of invalid pose sequences during long-term behavior generation, we incorporated a generative adversarial network (GAN). This proposed method was tested using the humanoid robot, Pepper, in a simulated environment. Given the challenges in assessing the success of social behavior generation, we devised novel metrics to quantify the discrepancy between the generated and ground-truth behaviors. Our analysis reveals the impact of different networks on behavior generation performance and compares the efficacy of learning multiple behaviors versus a single behavior. We anticipate that our method will find application in various sectors, including home service, guide, delivery, educational, and virtual robots, thereby enhancing user interaction and enjoyment.