“…Early works [29], [31], [32], [51] generally use traditional Machine Learning models, e.g. Support Vector Machine Regression (SVR) [25], [33], decision tree [21], [43], [52], Logistic regression [53], etc., to predict depression from hand-crafted features (Local Binary Pattern (LBP) [38], [41], Low-Level Descriptor (LLD) [21], [34], [43], Histogram of oriented gradients (HOG) [26], etc). For example, Meng et al [29] extracted LBP and EOH as visual features and LLD as audio features, and applied Motion History Histogram (MHH) to extract dynamics from short video segments.…”