Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition

Hu, Peng; Zhu, Hongyuan; Peng, Xi; Lin, Jie

doi:10.1609/aaai.v34i01.5339

Cited by 17 publications

(1 citation statement)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, to combine audio and visual modalities for unsupervised learning, existing works exploit the natural audio-visual correspondence in videos to formulate various self-supervised signals, which predict the cross-modal correspondence [314], [315], align the temporally corresponding representations [309], [316], [317], [318], or cluster their representations in a shared audio-visual latent space [208], [319]. Several works further explore audio, vision and language together for unsupervised representation learning by aligning different modalities in a shared multi-modal latent space [310], [320] or in a hierarchical latent space for audiovision and vision-language [308]. Open Challenges.…”

Section: Multi-modal Learning From Unlabeled Datamentioning

confidence: 99%

Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

Chen

Mancini

Zhu

et al. 2024

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime. Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. Recent progress in these paradigms has indicated the strong benefits of leveraging unlabeled data to improve model generalization and provide better model initialization. In this survey, we review the recent advanced deep learning algorithms on semi-supervised learning (SSL) and unsupervised learning (UL) for visual recognition from a unified perspective. To offer a holistic understanding of the state-of-the-art in these areas, we propose a unified taxonomy. We categorize existing representative SSL and UL with comprehensive and insightful analysis to highlight their design rationales in different learning scenarios and applications in different computer vision tasks. Lastly, we discuss the emerging trends and open challenges in SSL and UL to shed light on future critical research directions.

show abstract