Robots operating in everyday environments need to effectively perceive, model, and infer semantic properties of objects. Existing knowledge reasoning frameworks only model binary relations between an object's class label and its semantic properties, unable to collectively reason about object properties detected by different perception algorithms and grounded in diverse sensory modalities. We bridge the gap between multimodal perception and knowledge reasoning by introducing an n-ary representation that models complex, inter-related object properties. To tackle the problem of collecting n-ary semantic knowledge at scale, we propose a transformer neural network that directly generalizes knowledge from observations of object instances. The learned model can reason at different levels of abstraction, effectively predicting unknown properties of objects in different environmental contexts given different amounts of observed information. We quantitatively validate our approach against five prior methods on LINK, a unique dataset we contribute that contains 1457 situated object instances with 15 multimodal properties types and 200 total properties. Compared to the top-performing baseline, a Markov Logic Network, our model obtains a 10% improvement in predicting unknown properties of novel object instances while reducing training and inference time by 150 times. Additionally, we apply our work to a mobile manipulation robot, demonstrating its ability to leverage n-ary reasoning to retrieve objects and actively detect object properties. The code and data are available at https://github.com/wliu88/LINK.
Deaf children born to hearing parents lack continuous access to language, leading to weaker working memory compared to hearing children and deaf children born to Deaf parents. CopyCat is a game where children communicate with the computer via American Sign Language (ASL), and it has been shown to improve language skills and working memory. Previously, CopyCat depended on unscalable hardware such as custom gloves for sign verifcation, but modern 4K cameras and pose estimators present new opportunities. Before re-creating the CopyCat game for deaf children using of-the-shelf hardware, we evaluate whether current ASL recognition is sufcient.Using Hidden Markov Models (HMMs), user independent word accuracies were 90.6%, 90.5%, and 90.4% for AlphaPose, Kinect, and MediaPipe, respectively. Transformers, a state-of-the-art model in natural language processing, performed 17.0% worse on average.Given these results, we believe our current HMM-based recognizer can be successfully adapted to verify children's signing while playing CopyCat. CCS Concepts: • Human-centered computing → Ubiquitous and mobile computing systems and tools; Accessibility technologies; • Applied computing → Computer-managed instruction; • Computing methodologies → Machine learning; Feature selection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.