Zhun Liu scite author profile

Multimodal research is an emerging field of artificial intelligence, and one of the main research problems in this field is multimodal fusion. The fusion of multimodal data is the process of integrating multiple unimodal representations into one compact multimodal representation. Previous research in this field has exploited the expressiveness of tensors for multimodal representation. However, these methods often suffer from exponential increase in dimensions and in computational complexity introduced by transformation of input into tensor. In this paper, we propose the Lowrank Multimodal Fusion method, which performs multimodal fusion using low-rank tensors to improve efficiency. We evaluate our model on three different tasks: multimodal sentiment analysis, speaker trait analysis, and emotion recognition. Our model achieves competitive results on all these tasks while drastically reducing computational complexity. Additional experiments also show that our model can perform robustly for a wide range of low-rank settings, and is indeed much more efficient in both training and inference compared to other methods that utilize tensor representations.

show abstract

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Wang

Shen

Liu

et al. 2019

AAAI

313

176

View full text Add to dashboard Cite

Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations. Negative-shifted word representation Original word representation Positive-shifted word representation Visual Acoustic ⋯ excited voice raised eyebrows Visual Acoustic ⋯ soft voice shock Word Representation Space

show abstract

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Smith¹,

Patwary²,

Norick³

et al. 2022

Preprint

125

133

View full text Add to dashboard Cite

Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and finetuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models. As the result of a joint effort between Microsoft and NVIDIA, we present details on the training of the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters. In this paper, we first focus on the infrastructure as well as the 3D parallelism methodology used to train this model using DeepSpeed and Megatron. Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model. Finally, we discuss various evaluation results, as well as other interesting observations and new properties exhibited by MT-NLG. We demonstrate that MT-NLG achieves superior zero-, one-, and few-shot learning accuracies on several NLP benchmarks and establishes new state-of-the-art results. We believe that our contributions will help further the development of large-scale training infrastructures, large-scale language models, and natural language generations.

show abstract

Polyaniline‐Coated Fe₃O₄ Nanoparticle–Carbon‐Nanotube Composite and its Application in Electrochemical Biosensing

Liu

Wang

Xie

et al. 2008

Small

183

100

View full text Add to dashboard Cite

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Liu

Shen

Lakshminarasimhan³

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhun Liu

Efficient Low-rank Multimodal Fusion With Modality-Specific Factors

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Polyaniline‐Coated Fe₃O₄ Nanoparticle–Carbon‐Nanotube Composite and its Application in Electrochemical Biosensing

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Contact Info

Product

Resources

About

Zhun Liu

Efficient Low-rank Multimodal Fusion With Modality-Specific Factors

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Polyaniline‐Coated Fe3O4 Nanoparticle–Carbon‐Nanotube Composite and its Application in Electrochemical Biosensing

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Contact Info

Product

Resources

About

Polyaniline‐Coated Fe₃O₄ Nanoparticle–Carbon‐Nanotube Composite and its Application in Electrochemical Biosensing