This paper presents an action segmentation method utilizing multiple features on the basis of a novel intermediate fusion module, named Mutual Cross Fusion Module (MCFM). The proposed method analyzes multiple features on each feature's classifier stream. MCFM recalibrates the target feature in the middle of the classifier stream from the knowledge of the other features in contrast to the existing module of the previous method, which utilizes a joint representation learned from all features for the recalibration of the target feature. MCFM integrates the knowledge of multiple features without biassing the knowledge toward one of the multiple features. We compare the proposed method with the state-of-the-art methods on two public datasets: GTEA and 50Salads. The proposed method outperforms the state-of-the-art methods in terms of frame-wise accuracy, edit distance, and F1-score by 2.0, 1.6, and 2.9 points, respectively.
In this paper, we introduce a new video dataset for action segmentation, the BRIO-TA (BRIO Toy Assembly) dataset, which is designed to simulate operations in factory assembly. In contrast with existing datasets, BRIO-TA consists of two types of scenarios: normal work processes and anomalous work processes. Anomalies are further categorized into incorrect processes, omissions, and abnormal durations. The subjects in the videos are asked to perform either normal work or one of the three anomalies, and all video frames are manually annotated into 23 action classes. In addition, we propose a new metric called anomaly section accuracy (ASA) for evaluating the detection accuracy of anomalous segments in a video. With the new dataset and metric, we report that the state-of-the-art methods show a significantly low ASA, while they work for normal work segments. Demo videos are available at https://github.com/Tarmo-moriwaki/BRIO-TA sample and the full dataset will be released after publication.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.