Emotional expressivity can boost trust in human-human and humanmachine interaction. As a multimodal phenomenon, previous research argued that a mismatch in the expressive channels provides evidence of joint audio-video emotional processing. However, while previous work studied this from the point of view of emotion recognition and processing, not much is known about what effect a multimodal agent would have on a human-agent interaction task. Also, agent appearance could influence this interaction too. Here we manipulated the agent's multimodal emotional expression ("smiling face" and "smiling voice", or both) and agent type (photorealistic or cartoon-like virtual human) and assessed people's trust toward this agent. We measured trust using a mixed-methods approach, combining behavioural data from a survival task, questionnaire ratings and qualitative comments. These methods gave different results: while people commented on the importance of emotional expressivity in the agent's voice, this factor had limited influence on trusting behaviours; while people rated the cartoon-like agent on several traits higher than the photorealistic one, the agent's style also was not the most influential feature on people's trusting behaviour. These results highlight the contribution of a mixedmethods approach in human-machine interaction, as both explicit and implicit perception and behaviour will contribute to the success of the interaction.