Web surveys completed on smartphones open novel ways for measuring respondents’ attitudes, behaviors, and beliefs that are crucial for social science research and many adjacent research fields. In this study, we make use of the built-in microphones of smartphones to record voice answers in a smartphone survey and extract non-verbal cues, such as amplitudes and pitches, from the collected voice data. This allows us to predict respondents’ level of interest (i.e., disinterest, neutral, and high interest) based on their voice answers, which expands the opportunities for researching respondents’ engagement and answer behavior. We conducted a smartphone survey in a German online access panel and asked respondents four open-ended questions on political parties with requests for voice answers. In addition, we measured respondents’ self-reported survey interest using a closed-ended question with an end-labeled, seven-point rating scale. The results show a non-linear association between respondents’ predicted level of interest and answer length. Respondents with a predicted medium level of interest provide longer answers in terms of number of words and response times. However, respondents’ predicted level of interest and their self-reported interest are weakly associated. Finally, we argue that voice answers contain rich meta-information about respondents’ affective states, which are yet to be utilized in survey research.