This study investigates how an onscreen virtual agent's dialog capability and facial animation affect survey respondents' comprehension and engagement in “face-to-face” interviews, using questions from US government surveys whose results have far-reaching impact on national policies. In the study, 73 laboratory participants were randomly assigned to respond in one of four interviewing conditions, in which the virtual agent had either high or low dialog capability (implemented through Wizard of Oz) and high or low facial animation, based on motion capture from a human interviewer. Respondents, whose faces were visible to the Wizard (and videorecorded) during the interviews, answered 12 questions about housing, employment, and purchases on the basis of fictional scenarios designed to allow measurement of comprehension accuracy, defined as the fit between responses and US government definitions. Respondents answered more accurately with the high-dialog-capability agents, requesting clarification more often particularly for ambiguous scenarios; and they generally treated the high-dialog-capability interviewers more socially, looking at the interviewer more and judging high-dialog-capability agents as more personal and less distant. Greater interviewer facial animation did not affect response accuracy, but it led to more displays of engagement—acknowledgments (verbal and visual) and smiles—and to the virtual interviewer's being rated as less natural. The pattern of results suggests that a virtual agent's dialog capability and facial animation differently affect survey respondents' experience of interviews, behavioral displays, and comprehension, and thus the accuracy of their responses. The pattern of results also suggests design considerations for building survey interviewing agents, which may differ depending on the kinds of survey questions (sensitive or not) that are asked.