“…They separately extracted hand-crafted features, learned emotion classifiers, and composited images and music based on the predicted emotions. Many methods follow this pipeline [34,48,53,61,86]. They (1) extracted more discriminative emotion features, such as low-level color [9,34,48,53,61] and mid-level principlesof-art [86] for image; (2) employed different emotion representation models, from categorical states [9,34,53,61] to dimensional space [48,86]; (3) correspondingly learned different classifiers, from Support Vector Machine [9], Naive Bayes, and Decision Tree [53] to Support Vector Regression [86]; and (4) used different composition strategies to match image and music, from emotion category comparison [9,34,53,61] to Euclidean distance [48,86].…”