2024
DOI: 10.21203/rs.3.rs-5463235/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Robust Audio-Image Steganography using Cross-Modal Based Transformer Models

Mark Taremwa,
Roger Nick Anaedevha,
Alexander Genadievich Trofimov

Abstract: This research investigates the use of Vision Transformers (ViT), Audio Spectrogram Transformers (AST), and Cross-Modal Transformers (CMT) in audio-image fusion tasks, aiming to improve the representation learning and interaction between auditory and visual data. The ViT model extracts visual features from image patches resized to 224x224 pixels, while the AST model converts audio signals into mel spectrograms to capture detailed auditory features. The central focus is on the robust CMT model, which integrates … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 8 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?