“…It provides the flexibility of time and place, in addition to learning efficiency in a cost-effective manner [1]. To relieve the large effort of manual content-making, many studies have explored automatic guidance authoring using experts' experiences recorded through actual work or demonstrations [2]- [7]. The emergence of wearable devices, for example smart glasses and active cameras, makes such recording easy in a human-centric manner, which can be referred to as first-person vision/view (FPV) or egocentric vision [8], [9].…”