“…With the continuous emergence of various media and short videos [ 1 , 2 ] in recent years, the impact on children's emotions [ 3 , 4 ] in daily life is getting bigger and bigger, such as popular music and videos published on YouTube [ 5 , 6 ], TikTok [ 7 , 8 ], and other platforms. Often these data contain three types of modalities, namely, video [ 9 , 10 ], audio [ 11 , 12 ], and text information [ 13 ].…”