2022
DOI: 10.1007/s00371-022-02492-4
|View full text |Cite
|
Sign up to set email alerts
|

A multimodal transformer to fuse images and metadata for skin disease classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
28
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 58 publications
(29 citation statements)
references
References 32 publications
1
28
0
Order By: Relevance
“…According to some reviewed works, different feature extraction methods can influence the fusion results significantly. For example, Cai et al [51] observed that the ViT-based image encoder led to better fusion results than CNN-based encodes, and the fusion model using clinical features with soft one-hot encoding also outperformed hard encoding and word2vec. Li et al [58] compared the contribution of different pre-trained language models to multimodal fusion.…”
Section: Discussion and Future Workmentioning
confidence: 99%
See 3 more Smart Citations
“…According to some reviewed works, different feature extraction methods can influence the fusion results significantly. For example, Cai et al [51] observed that the ViT-based image encoder led to better fusion results than CNN-based encodes, and the fusion model using clinical features with soft one-hot encoding also outperformed hard encoding and word2vec. Li et al [58] compared the contribution of different pre-trained language models to multimodal fusion.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Chen et al [26] calculated the co-attention weight to generate the genomic-guided WSI embeddings. Similarly, Lu et al [69] proposed a symmetric cross attention to fuse the genomic data and pathology image embeddings of glioma tumors for multitask learning, while Cai et al [51] proposed an asymmetrical multi-head cross attention to fuse the camera images and metadata for skin classification. Li et al [31] aggregated multiscale pathology images and clinical features to predict the lymph node metastasis (LNM) of breast cancer.…”
Section: Attention-based Fusion Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To better utilize information from multiple sources, multi-modal fusion classification is introduced into the detection and classification of skin cancer. Cai et al ( 12 ) proposed a multimodal transformer to fuse multimodal information. Chen et al ( 13 ) proposed a skin cancer Multimodal Data Fusion Diagnosis Network (MDFNet) framework based on a data fusion strategy to effectively fuse clinical skin images with patient clinical data.…”
Section: Introductionmentioning
confidence: 99%