2023
DOI: 10.1016/j.knosys.2023.110689
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Head multimodal deep interest recommendation network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…The developed model uses a joint embedding space to represent the input signals, which allows the model to learn the relationships between different text and image modalities. The joint embedding space is created by combining the image and text hidden states or feature vectors, learned by Transformer encoders, using a multimodal projection head [15,20].…”
Section: The Proposed Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…The developed model uses a joint embedding space to represent the input signals, which allows the model to learn the relationships between different text and image modalities. The joint embedding space is created by combining the image and text hidden states or feature vectors, learned by Transformer encoders, using a multimodal projection head [15,20].…”
Section: The Proposed Modelmentioning
confidence: 99%
“…The developed model uses a joint embedding space to represent the input signals, which allows the model to learn the relationships between different text and image modalities. The joint embedding space is created by combining the image and text hidden states or feature vectors, learned by Transformer encoders, using a multimodal projection head [15,20]. A semantically hierarchical common space is defined to account for the granularity of different modalities, and the contrastive loss method is employed to train the model.…”
Section: The Proposed Modelmentioning
confidence: 99%