We evaluate user experience (UX) when users play and control music with three smart speakers: Amazon's Alexa Echo, Google Home and Apple's Siri on a HomePod. For measuring UX we use five established UX and usability metrics (AttrakDiff, SASSI, SUISQ-R, SUS). We investigated the sensitivity of these five questionnaires in two ways: firstly we compared the UX reported for each of the speakers, secondly we compared the UX of completing easy single tasks and more difficult multi tasks with these speakers. We find that the investigated questionnaires are sufficiently sensitive to show significant differences in UX for these easy and difficult tasks. In addition, we find some significant UX differences between the tested speakers. Specifically, all tested questionnaires, except the SUS, show a significant difference in UX between Siri and Alexa, with Siri being perceived as more user friendly for controlling music. We discuss implications of our work for researchers and practitioners.Speech assistance is a growing market with a 25% yearly growth predicted in the next three years [21]. Speech assistants can be integrated in different devices, like smartphones, personal computers and smart speakers, which are dedicated speakers that can be controlled by voice commands. In our work we focus on smart speakers. Currently one in five Americans over 18 years owns a smart speaker [28], which is a remarkable number, considering that smart speakers were first introduced in 2014 [16]. It means that within six years approximately 53 Million Americans bought a smart speaker, which is a market development comparable to the rapid spread of smart phones [7]. This market trend is not confined to the North American market, but is present throughout the world, in Europe, as well as Asia, Africa and Latin America [32,33,8,17], showing that smart speakers are of broad public interest.The consumer speech assistance market in the English speaking world, as well as in Europe, is dominated by three manufacturers and assistants: Amazon with Alexa, Google with Google Assistant and Apple with Siri [8,36]. These three assistants cover more than 88% of the market in the US [36]. Intuitively, these three assistants are named as the most commonly known Voice User Interfaces (VUIs) [31] and featured as smart speakers in numerous product
We evaluate the user experience (UX) of Amazon's Alexa when users play and control music. For measuring UX we use established UX metrics (SASSI, SUISQ-R, SUS, AttrakDiff). We investigated face validity by asking users to rate how well they think a questionnaire measures what it is supposed to measure and we assessed construct validity by correlating UX scores of questionnaires with each other. We find a mismatch between face and construct validity of the evaluated questionnaires. Specifically, users feel that SASSI represents their experience better than other questionnaires, however this is not supported by correlations between questionnaires, which suggest that all investigated questionnaires measure UX to a similar extent. Importantly, the fact that face validity and construct validity diverge is not surprising as this has been observed before. Our work adds to existing literature by providing face and construct validity scores of UX questionnaires for interactions with the common speech assistant Alexa.
Many quality evaluation methods are used to assess uni-modal audio or video content without considering perceptual, cognitive, and interactive aspects present in virtual reality (VR) settings. Consequently, little is known regarding the repercussions of the employed evaluation method, content, and subject behavior on the quality ratings in VR. This mixed between-and within-subjects study uses four subjective audio quality evaluation methods (viz. multiple-stimulus with and without reference for direct scaling, and rank-order elimination and pairwise comparison for indirect scaling) to investigate the contributing factors present in multi-modal 6-DoF VR on quality ratings of real-time audio rendering. For each between-subjects employed method, two sets of conditions in five VR scenes were evaluated within-subjects. The conditions targeted relevant attributes for binaural audio reproduction using scenes with various amounts of user interactivity. Our results show all referenceless methods produce similar results using both condition sets. However, rank-order elimination proved to be the fastest method, required the least amount of repetitive motion, and yielded the highest discrimination between spatial conditions. Scene complexity was found to be a main effect within results, with behavioral and task load index results implying more complex scenes and interactive aspects of 6-DoF VR can impede quality judgments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.