MOS (mean opinion score) subjective quality studies are used to evaluate many signal processing methods. Since laboratory quality studies are time consuming and expensive, researchers often run small studies with less statistical significance or use objective measures which only approximate human perception. We propose a cost-effective and convenient measure called crowdMOS, obtained by having internet users participate in a MOS-like listening study. Workers listen and rate sentences at their leisure, using their own hardware, in an environment of their choice. Since these individuals cannot be supervised, we propose methods for detecting and discarding inaccurate scores. To automate crowdMOS testing, we offer a set of freely distributable, open-source tools for Amazon Mechanical Turk, a platform designed to facilitate crowdsourcing. These tools implement the MOS testing methodology described in this paper, providing researchers with a user-friendly means of performing subjective quality evaluations without the overhead associated with laboratory studies. Finally, we demonstrate the use of crowdMOS using data from the Blizzard text-to-speech competition, showing that it delivers accurate and repeatable results.
Subjective tests are generally regarded as the most reliable and definitive methods for assessing image quality. Nevertheless, laboratory studies are time consuming and expensive. Thus, researchers often choose to run informal studies or use objective quality measures, producing results which may not correlate well with human perception. In this paper we propose a cost-effective and convenient subjective quality measure called crowdMOS, obtained by having internet workers participate in MOS (mean opinion score) subjective quality studies. Since these workers cannot be supervised, we propose methods for detecting and discarding inaccurate or malicious scores. To facilitate this process, we offer an open source set of tools for Amazon Mechanical Turk, which is an internet marketplace for crowdsourcing. These tools completely automate the test design, score retrieval and statistical analysis, abstracting away the technical details of Mechanical Turk and ensuring a user-friendly, affordable and consistent test methodology. We demonstrate crowdMOS using data from the LIVE subjective quality image dataset, showing that it delivers accurate and repeatable results.Index Terms-crowdsourcing, subjective quality, quality assessment, mean opinion score, MOS, mechanical turk.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.