2024
DOI: 10.1111/exsy.13760
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Biomedical Text Summarization With QuantizedLLaMA2: Enhancing Memory Usage and Inference on Low Powered Devices

Sanjeev Kumar,
Vikas Ranjan,
Arjab Chakrabarti
et al.

Abstract: The deployment of large language models (LLMs) on edge devices and non‐server environments presents significant challenges, primarily due to constraints in memory usage, computational power, and inference time. This article investigates the feasibility of running LLMs across such devices by focusing on optimising memory usage, employing quantization techniques, and reducing inference time. Specifically, we utilise LLaMA 2 for biomedical text summarization and implement low‐rank adaptation (LoRA) quantization t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 26 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?