AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models

Wang, Yaqing; Mukherjee, Subhabrata; Liu, Xiaodong; Awadallah, Ahmed Hassan; Gao, Jianfeng

doi:10.48550/arxiv.2205.12410

Cited by 2 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The adapters are taught how to pick up knowledge appropriate to a given task. PEFT of pre-trained language models has recently demonstrated remarkable results, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters (Fu et al, 2023;Liu et al, 2022;Wang et al, 2022), thereby addressing storage and communication constraints. Such approaches include prefix-tuning (Li and Liang, 2021), prompt-tuning (Hu et al, 2021b), soft-prompting (Lester et al, 2021) and LoRa (Hu et al, 2021a).…”

Section: Introductionmentioning

confidence: 99%

rematchka at ArAIEval Shared Task: Prefix-Tuning & Prompt-tuning for Improved Detection of Propaganda and Disinformation in Arabic Social Media Content

Abdel-Salam

2023

Proceedings of ArabicNLP 2023

View full text Add to dashboard Cite

The rise of propaganda and disinformation in the digital age has necessitated the development of effective detection methods to combat the spread of deceptive information. In this paper, we present our approach proposed for the ArAIEval shared task: propaganda and disinformation detection in Arabic text. Our system utilized different pre-trained BERT based models, that make use of prompt-learning based on knowledgeable expansion and prefix-tuning. The proposed approach secured third place in subtask-1A with a 0.7555 F1-micro score, and second place in subtask-1B with a 0.5658 F1micro score. However, for subtask-2A & 2B, the proposed system achieved fourth place with an F1-micro score of 0.9040, and 0.8219 respectively. Our findings suggest that prompttuning-based & prefix-tuning based models performed better than conventional fine-tuning. Furthermore, using loss-aware class imbalance, improved performance.

show abstract

Section: Introductionmentioning

confidence: 99%

rematchka at ArAIEval Shared Task: Prefix-Tuning & Prompt-tuning for Improved Detection of Propaganda and Disinformation in Arabic Social Media Content

Abdel-Salam

2023

Proceedings of ArabicNLP 2023

View full text Add to dashboard Cite

show abstract

“…Based on this, other improvements to the method have been proposed. For instance, inspired by previous work on mixture-of-experts (MoE), AdaMix [160] uses multiple experts in each adapter layer. Additional techniques such as random expert selection and consistency regularization are employed in order to reduce computational cost and stabilize training.…”

Section: Additive Methodsmentioning

confidence: 99%

Towards resource-aware dialogue systems and sentiment analysis

Pandelea

View full text Add to dashboard Cite

In the past few years, the use of transformer-based models has experienced increasing popularity as new state-of-the-art performance was achieved in several natural language processing tasks. As these models are often extremely large, however, their use for applications within embedded devices may not be feasible. This thesis looks at two specific applications, Dialogue Systems and Sentiment Analysis.These offer great potential to enhance user experience, but at the same time, when running on embedded devices, cannot make use of the same models and algorithms designed for server-based execution, due to factors such as reduced memory capacity and limited computational power. Novel solutions that are resource-and user-aware are therefore needed.Dialogue Systems Research on building dialogue systems able to engage in natural sounding conversation with humans has attracted increasing attention in recent years. This has led to the rise of commercial conversational agents such as Google Home, Alexa and Siri situated on embedded devices, that enable users to interface with a wide range of underlying functionalities in a natural and seamless manner. However, in part due to memory and computational power constraints, these systems necessitate to either be placed on, or initiate frequent communication with, a server in order to process the users' queries. When placed on embedded systems, this communication may act as a bottleneck, resulting in delays as well as in the halt of the system should the network connection be lost or unavailable.Moreover, despite the rise of generative models such as ChatGPT, retrieval-based dialogue systems remain a promising approach due to their ability to deliver syntactically rich and informative responses while allowing for greater control on the responses that the model can provide, which may be critical in some applications. This thesis proposes a new framework for hardware-aware retrieval-based dialogue systems based on the Dual-Encoder architecture, coupled with a clustering method to group candidates pertaining to a same conversation, that reduces storage capacity and computational power requirements. xi xiiSentiment Analysis The availability of new datasets and deep learning techniques have led to a surge of effort directed towards sentiment analysis research. However, little attention has been given to the development of models that are not only accurate, but also suitable for user-specific use or geared towards resourceconstrained devices. State-of-the-art models often have tens of millions of parameters which make it unfeasible to deploy such solutions on devices characterized by limited memory and computational power. This work explores the concept of software-hardware co-design and propose a methodical procedure to select the most desirable model taking into consideration application constraints described in terms of memory and latency. In doing so, it shows how fully utilizing the feature extraction capabilities of large pre-trained language models can close the gap between the ...

show abstract

AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models

Cited by 2 publications

References 15 publications

rematchka at ArAIEval Shared Task: Prefix-Tuning & Prompt-tuning for Improved Detection of Propaganda and Disinformation in Arabic Social Media Content

rematchka at ArAIEval Shared Task: Prefix-Tuning & Prompt-tuning for Improved Detection of Propaganda and Disinformation in Arabic Social Media Content

Towards resource-aware dialogue systems and sentiment analysis

Contact Info

Product

Resources

About