When multiple redundant probabilistic judgments are obtained from subject matter experts, it is common practice to aggregate their differing views into a single probability or distribution. Although many methods have been proposed for mathematical aggregation, no single procedure has gained universal acceptance. The most widely used procedure is simple arithmetic averaging, which has both desirable and undesirable properties. Here we propose an alternative for aggregating distribution functions that is based on the median cumulative probabilities at fixed values of the variable. It is shown that aggregating cumulative probabilities by medians is equivalent, under certain conditions, to aggregating quantiles. Moreover, the median aggregate has better calibration than mean aggregation of probabilities when the experts are independent and well calibrated and produces sharper aggregate distributions for well-calibrated and independent experts when they report a common location-scale distribution. We also compare median aggregation to mean aggregation of quantiles.
Creating a human-robot interface is a daunting experience. Capabilities and functionalities of the interface are dependent on the robustness of many different sensor and input modalities. For example, object recognition poses problems for state-of-the-art vision systems. Speech recognition in noisy environments remains problematic for acoustic systems. Natural language understanding and dialog are often limited to specific domains and baffled by ambiguous or novel utterances. Plans based on domain-specific tasks limit the applicability of dialog managers. The types of sensors used limit spatial knowledge and understanding, and constrain cognitive issues, such as perspective-taking.In this research, we are integrating several modalities, such as vision, audition, and natural language understanding to leverage the existing strengths of each modality and overcome individual weaknesses. We are using visual, acoustic, and linguistic inputs in various combinations to solve such problems as the disambiguation of referents (objects in the environment), localization of human speakers, and determination of the source of utterances and appropriateness of responses when humans and robots interact. For this research, we limit our consideration to the interaction of two humans and one robot in a retrieval scenario. This paper will describe the system and integration of the various modules prior to future testing.
Robotic sound localization has traditionally been restricted to either on-robot microphone arrays or embedded microphones in aware environments, each of which have limitations due to their static configurations. This work overcomes the static configuration problems by using visual localization to track multiple wireless microphones in the environment with enough accuracy to combine their auditory streams in a traditional localization algorithm. In this manner, microphones can move or be moved about the environment, and still be combined with existing on-robot microphones to extend array baselines and effective listening ranges without having to re-measure inter-microphone distances.
A 3D optical flow tracking system was used to track participants as they watched a series of boring videos. The video stream of the participants was rated for boredom events. Ratings and head position data were combined to predict boredom events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.