Spatio-temporal representations in frame sequences play an important role in the task of action recognition. Previously, a method of using optical flow as a temporal information in combination with a set of RGB images that contain spatial information has shown great performance enhancement in the action recognition tasks. However, it has an expensive computational cost and requires two-stream (RGB and optical flow) framework. In this paper, we propose MFNet (Motion Feature Network) containing motion blocks which make it possible to encode spatiotemporal information between adjacent frames in a unified network that can be trained end-to-end. The motion block can be attached to any existing CNN-based action recognition frameworks with only a small additional cost. We evaluated our network on two of the action recognition datasets (Jester and Something-Something) and achieved competitive performances for both datasets by training the networks from scratch.
This paper presents the technical approaches used and experimental results obtained by Team SNU (Seoul National University) at the 2015 DARPA Robotics Challenge (DRC) Finals. Team SNU is one of the newly qualified teams, unlike 12 teams who previously participated in the December 2013 DRC Trials. The hardware platform THORMANG, which we used, has been developed by ROBOTIS. THORMANG is one of the smallest robots at the DRC Finals. Based on this platform, we focused on developing software architecture and controllers in order to perform complex tasks in disaster response situations and modifying hardware modules to maximize manipulability. Ensuring stability and modularization are two main keywords in the technical approaches of the architecture. We designed our interface and controllers to achieve a higher robustness level against disaster situations. Moreover, we concentrated on developing our software architecture by integrating a number of modules to eliminate software system complexity and programming errors. With these efforts on the hardware and software, we successfully finished the competition without falling, and we ranked 12th out of 23 teams. This paper is concluded with a number of lessons learned by analyzing the 2015 DRC Finals.
Figure 1. Fitting results of our novel texture generative model. StyleUV can generate high-fidelity images and cover the diverse nature of human faces including but not limited to the faces of a baby, a black person, an elderly person, and a young woman wearing heavy makeup.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.