Artificial Intelligence (AI) for accessibility is a rapidly growing area, requiring datasets that are inclusive of the disabled users that assistive technology aims to serve. We offer insights from a multi-disciplinary project that constructed a dataset for teachable object recognition with people who are blind or low vision. Teachable object recognition enables users to teach a model objects that are of interest to them, e.g., their white cane or own sunglasses, by providing example images or videos of objects. In this paper, we make the following contributions: 1) a disability-first procedure to support blind and low vision data collectors to produce good quality data, using video rather than images; 2) a validation and evolution of this procedure through a series of data collection phases and 3) a set of questions to orient researchers involved in creating datasets toward reflecting on the needs of their participant community.CCS Concepts: • Human-centered computing → Accessibility; accessibility systems and tools; accessibility technologies; • Computing methodologies → Machine learning.
Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the realworld. To close this gap, we present the ORBIT dataset and benchmark, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. The dataset contains 3,822 videos of 486 objects recorded by people who are blind/low-vision on their mobile phones, and the benchmark reflects a realistic, highly challenging recognition problem, providing a rich playground to drive research in robustness to few-shot, high-variation conditions. We set the first state-of-the-art on the benchmark and show that there is massive scope for further innovation, holding the potential to impact a broad range of real-world vision applications including tools for the blind/low-vision community. The dataset is available at https://bit.ly/2OyElCj and the code to run the benchmark at https://bit.ly/39YgiUW.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.