PLEA is an interactive, biomimetic robotic head with non-verbal communication capabilities. PLEA reasoning is based on a multimodal approach combining video and audio inputs to determine the current emotional state of a person. PLEA expresses emotions using facial expressions generated in real-time, which are projected onto a 3D face surface. In this paper, a more sophisticated computation mechanism is developed and evaluated. The model for audio-visual person separation can locate a talking person in a crowded place by combining input from the ResNet network with input from a hand-crafted algorithm. The first input is used to find human faces in the room, and the second input is used to determine the direction of the sound and to focus attention on a single person. After an information fusion procedure is performed, the face of the person speaking is matched with the corresponding sound direction. As a result of this procedure, the robot could start an interaction with the person based on non-verbal signals. The model was tested and evaluated under laboratory conditions by interaction with users. The results suggest that the methodology can be used efficiently to focus a robot’s attention on a localized person.
Most industrial workplaces involving robots and other apparatus operate behind the fences to remove defects, hazards, or casualties. Recent advancements in machine learning can enable robots to co-operate with human co-workers while retaining safety, flexibility, and robustness. This article focuses on the computation model, which provides a collaborative environment through intuitive and adaptive human–robot interaction (HRI). In essence, one layer of the model can be expressed as a set of useful information utilized by an intelligent agent. Within this construction, a vision-sensing modality can be broken down into multiple layers. The authors propose a human-skeleton-based trainable model for the recognition of spatiotemporal human worker activity using LSTM networks, which can achieve a training accuracy of 91.365%, based on the InHARD dataset. Together with the training results, results related to aspects of the simulation environment and future improvements of the system are discussed. By combining human worker upper body positions with actions, the perceptual potential of the system is increased, and human–robot collaboration becomes context-aware. Based on the acquired information, the intelligent agent gains the ability to adapt its behavior according to its dynamic and stochastic surroundings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.