We address in this work the process of agreement rate analysis for characterizing the level of consensus between participants' proposals elicited during guessability studies. Two new measures, i.e., disagreement rate for referents and coagreement rate between referents, are proposed to accompany the widelyused agreement rate formula of Wobbrock et al.[37] when reporting participants' consensus for symbolic input. A statistical significance test for comparing the agreement rates of k≥2 referents is presented in analogy with Cochran's success/failure Q test [5], for which we express the test statistic in terms of agreement and coagreement rates. We deliver a toolkit to assist practitioners to compute agreement, disagreement, and coagreement rates, and run statistical tests for agreement rates at p=.05, .01, and .001 levels of significance. We validate our theoretical development of agreement rate analysis in relation with several previously published elicitation studies. For example, when we present the probability distribution function of the agreement rate measure, we also use it (1) to explain the magnitude of agreement rates previously reported in the literature, and (2) to propose qualitative interpretations for agreement rates, in analogy with Cohen's guidelines for effect sizes [6]. We also re-examine previously published elicitation data from the perspective of the agreement rate test statistic, and highlight new findings on the effect of referents over agreement rates, unattainable prior to this work. We hope that our contributions will advance the current knowledge in agreement rate analysis, providing researchers and practitioners with new techniques and tools to help them understand user-elicited data at deeper levels of detail and sophistication.
Abstract. Our empirical results show that users perceive the execution difficulty of single stroke gestures consistently, and execution difficulty is highly correlated with gesture production time. We use these results to design two simple rules for estimating execution difficulty: establishing the relative ranking of difficulty among multiple gestures; and classifying a single gesture into five levels of difficulty. We confirm that the CLC model does not provide an accurate prediction of production time magnitude, and instead show that a reasonably accurate estimate can be calculated using only a few gesture execution samples from a few people. Using this estimated production time, our rules, on average, rank gesture difficulty with 90% accuracy and rate gesture difficulty with 75% accuracy. Designers can use our results to choose application gestures, and researchers can build on our analysis in other gesture domains and for modeling gesture performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.