Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

Zhuang, Xuqiang; Fang-ai, Liu; Hou, Jianrong; Hao, Jianhua; Cai, Xiaohong

doi:10.1007/s11063-021-10713-5

Cited by 20 publications

(6 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The ViT model has opened a new era in DL [ 41 , 43 ]. However, the self-attention mechanism of the ViT model has second-order complexity.…”

Section: Methodsmentioning

confidence: 99%

A hybrid approach based on multipath Swin transformer and ConvMixer for white blood cells classification

Üzen,

Fırat

2024

Health Inf Sci Syst

View full text Add to dashboard Cite

White blood cells (WBC) play an effective role in the body’s defense against parasites, viruses, and bacteria in the human body. Also, WBCs are categorized based on their morphological structures into various subgroups. The number of these WBC types in the blood of non-diseased and diseased people is different. Thus, the study of WBC classification is quite significant for medical diagnosis. Due to the widespread use of deep learning in medical image analysis in recent years, it has also been used in WBC classification. Moreover, the ConvMixer and Swin transformer models, recently introduced, have garnered significant success by attaining efficient long contextual characteristics. Based on this, a new multipath hybrid network is proposed for WBC classification by using ConvMixer and Swin transformer. This proposed model is called Swin Transformer and ConvMixer based Multipath mixer (SC-MP-Mixer). In the SC-MP-Mixer model, firstly, features with strong spatial details are extracted with the ConvMixer. Then Swin transformer effectively handle these features with self-attention mechanism. In addition, the ConvMixer and Swin transformer blocks consist of a multipath structure to obtain better patch representations in the SC-MP-Mixer. To test the performance of the SC-MP-Mixer, experiments were performed on three WBC datasets with 4 (BCCD), 8 (PBC) and 5 (Raabin) classes. The experimental studies resulted in an accuracy of 99.65% for PBC, 98.68% for Raabin, and 95.66% for BCCD. When compared with the studies in the literature and the state-of-the-art models, it was seen that the SC-MP-Mixer had more effective classification results.

show abstract

“…The ViT model has opened a new era in DL [ 41 , 43 ]. However, the self-attention mechanism of the ViT model has second-order complexity.…”

Section: Methodsmentioning

confidence: 99%

A hybrid approach based on multipath Swin transformer and ConvMixer for white blood cells classification

Üzen,

Fırat

2024

Health Inf Sci Syst

View full text Add to dashboard Cite

show abstract

“…Vision Transformer (ViT) modeli derin öğrenmede yeni bir dönem açmıştır [33,34]. Fakat ViT modelinin öz dikkat mekanizması ikinci dereceden karmaşıklığa sahiptir.…”

Section: Convmixer Ağ Mimarisiunclassified

Bal Arı Hastalıklarının Sınıflandırılması için ConvMixer, VGG16 ve ResNet101 Tabanlı Topluluk Öğrenme Yaklaşımı

Üzen,

Altın,

Balıkçı Çiçek

2024

Fırat Üniversitesi Mühendislik Bilimleri Dergisi

View full text Add to dashboard Cite

Bal arıları birçok etkenden dolayı ekosistemin en önemli bileşenlerinden biridir. Fakat son zamanlarda artan varroa paraziti, iklim değişiklikleri ve böcek istilası gibi etkenlerden dolayı bal arıları tehdit altındadır. Bundan dolayı son zamanlarda gelişmiş yapay zekâ teknikleri ile arılarının analiz edilmesi oldukça önemli bir araştırma konusu olmuştur. Bu çalışmada arı hastalıklarının sınıflandırılması için Evrişimsel sinir ağ mimarileri tabanlı bir topluluk öğrenme yaklaşımı sunulmuştur. ConvMixer, VGG16 ve ResNet101 tabanlı topluluk öğrenme yaklaşımı (CVR-TÖY) olarak adlandırılan bu model temel olarak VGG16, ResNet101 ve ConvMixer sınıflandırıcılarının tahmin skorlarının birleştirmesine dayanmaktadır. Bu sayede farklı yaklaşım teknikleri ile geliştirilen VGG16, ResNet101 ve ConvMixer yapılarının tahmin çıktıları etkili bir şekilde birleştirilerek bal arı hastalık sınıflandırma performansı artırılmıştır. Tahmin skorları birleştirilirken iki yaklaşım denenmiştir. Birinci yaklaşımda modellerin tahmin çıktılarının en yüksek değeri alınarak sınıflandırma tahmini yapılmıştır. İkinci model ise ortalama değer alma yaklaşımıdır. Ortalama değer alma yaklaşımının ortak akıl modeli ile en iyi sonucu ürettiği görülmüştür. Deneysel çalışmalarda 6 farklı kovan probleminden etkilenen arı görüntülerini içeren BeeImage Dataset (BI) veri kümesi kullanılmıştır. Bu deneysel çalışmada önerilen modelden %98.87 F1-skoru elde edilmiştir. Ayrıca yapılan deneysel çalışmada önerilen model son teknolojik modeller ile karşılaştırılmıştır. Karşılaştırma sonucunda önerilen modelin F1-skoru %2.31 daha yüksek performans göstermiştir.

show abstract

“…Transformerbased multi-modality cross attention was also applied to enhance the interaction of two MR modalities and better investigate multi-modal paired attention. The head number of multi-head attention was set to eight [10].…”

Section: Pre-processing Of Mr Imagesmentioning

confidence: 99%

Deep learning-based high-accuracy detection for lumbar and cervical degenerative disease on T2-weighted MR images

et al. 2023

View full text Add to dashboard Cite

Purpose To develop and validate a deep learning (DL) model for detecting lumbar degenerative disease in both sagittal and axial views of T2-weighted MRI and evaluate its generalized performance in detecting cervical degenerative disease. Methods T2-weighted MRI scans of 804 patients with symptoms of lumbar degenerative disease were retrospectively collected from three hospitals. The training dataset (n = 456) and internal validation dataset (n = 134) were randomly selected from the center I. Two external validation datasets comprising 100 and 114 patients were from center II and center III, respectively. A DL model based on 3D ResNet18 and transformer architecture was proposed to detect lumbar degenerative disease. In addition, a cervical MR image dataset comprising 200 patients from an independent hospital was used to evaluate the generalized performance of the DL model. The diagnostic performance was assessed by the free-response receiver operating characteristic (fROC) curve and precision–recall (PR) curve. Precision, recall, and F1-score were used to measure the DL model. Results A total of 2497 three-dimension retrogression annotations were labeled for training (n = 1157) and multicenter validation (n = 1340). The DL model showed excellent detection efficiency in the internal validation dataset, with F1-score achieving 0.971 and 0.903 on the sagittal and axial MR images, respectively. Good performance was also observed in the external validation dataset I (F1-score, 0.768 on sagittal MR images and 0.837 on axial MR images) and external validation dataset II (F1-score, 0.787 on sagittal MR images and 0.770 on axial MR images). Furthermore, the robustness of the DL model was demonstrated via transfer learning and generalized performance evaluation on the external cervical dataset, with the F1-score yielding 0.931 and 0.919 on the sagittal and axial MR images, respectively. Conclusion The proposed DL model can automatically detect lumbar and cervical degenerative disease on T2-weighted MR images with good performance, robustness, and feasibility in clinical practice.

show abstract

Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

Cited by 20 publications

References 41 publications

A hybrid approach based on multipath Swin transformer and ConvMixer for white blood cells classification

A hybrid approach based on multipath Swin transformer and ConvMixer for white blood cells classification

Bal Arı Hastalıklarının Sınıflandırılması için ConvMixer, VGG16 ve ResNet101 Tabanlı Topluluk Öğrenme Yaklaşımı

Deep learning-based high-accuracy detection for lumbar and cervical degenerative disease on T2-weighted MR images

Contact Info

Product

Resources

About