With the development of metaverse(s), industry and academia are searching for the best ways to represent users' avatars in shared virtual environments (VEs), where real‐time communication between users is required. The expressiveness of avatars is crucial for transmitting emotions that are key for social presence and user experience, and are conveyed via verbal and non‐verbal facial and body signals. In this paper, two real‐time modalities for conveying expressions in virtual reality (VR) via realistic, full‐body avatars are compared by means of a user study. The first modality uses dedicated hardware (i.e., eye and facial trackers) to allow a mapping between the user's facial expressions/eye movements and the avatar model. The second modality relies on an algorithm that, starting from an audio clip, approximates the facial motion by generating plausible lip and eye movements. The participants were requested to observe, for both the modalities, the avatar of an actor performing six scenes involving as many basic emotions. The evaluation considered mainly social presence and emotion conveyance. Results showed a clear superiority of facial tracking when compared to lip sync in conveying sadness and disgust. The same was less evident for happiness and fear. No differences were observed for anger and surprise.