“…Action Recognition Our work is related to previous research on action recognition from third person vision [7,13,14,30,34,35,49,57,59,60,61,67] and first person vision [11,12,36,37,38,44,48,52,54,55,56]. Specifically, we build on previous ideas investigated in the context of action recognition such as the use of multiple modali-ties for video analysis [49], the use of Temporal Segment Networks [61] as a principled way to train CNNs for action recognition, as well as the explicit encoding of object-based features [11,38,44,51,56] to analyze egocentric video.…”