“…Different from the end-to-end approaches, in feature-based approaches, first, multi-modal handcrafted features are extracted from videos, and then the features are fed to a classifier or regressor to output engagement [6], [7], [8], [10], [11], [12], [16], [17], [38], [39], [40], [41], [42], [43], [44], [45]. Table 1 summarizes the literature of feature-based video engagement measurement approaches focusing on their features, machine-learning models, and datasets.…”