Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Background/Objectives: Large language models (LLMs) show promise in healthcare but face challenges with hallucinations, particularly in rapidly evolving fields like diabetes management. Traditional LLM updating methods are resource-intensive, necessitating new approaches for delivering reliable, current medical information. This study aimed to develop and evaluate a novel retrieval system to enhance LLM reliability in diabetes management across different languages and guidelines. Methods: We developed a dual retrieval-augmented generation (RAG) system integrating both Korean Diabetes Association and American Diabetes Association 2023 guidelines. The system employed dense retrieval with 11 embedding models (including OpenAI, Upstage, and multilingual models) and sparse retrieval using BM25 algorithm with language-specific tokenizers. Performance was evaluated across different top-k values, leading to optimized ensemble retrievers for each guideline. Results: For dense retrievers, Upstage’s Solar Embedding-1-large and OpenAI’s text-embedding-3-large showed superior performance for Korean and English guidelines, respectively. Multilingual models outperformed language-specific models in both cases. For sparse retrievers, the ko_kiwi tokenizer demonstrated superior performance for Korean text, while both ko_kiwi and porter_stemmer showed comparable effectiveness for English text. The ensemble retrievers, combining optimal dense and sparse configurations, demonstrated enhanced coverage while maintaining precision. Conclusions: This study presents an effective dual RAG system that enhances LLM reliability in diabetes management across different languages. The successful implementation with both Korean and American guidelines demonstrates the system’s cross-regional capability, laying a foundation for more trustworthy AI-assisted healthcare applications.
Background/Objectives: Large language models (LLMs) show promise in healthcare but face challenges with hallucinations, particularly in rapidly evolving fields like diabetes management. Traditional LLM updating methods are resource-intensive, necessitating new approaches for delivering reliable, current medical information. This study aimed to develop and evaluate a novel retrieval system to enhance LLM reliability in diabetes management across different languages and guidelines. Methods: We developed a dual retrieval-augmented generation (RAG) system integrating both Korean Diabetes Association and American Diabetes Association 2023 guidelines. The system employed dense retrieval with 11 embedding models (including OpenAI, Upstage, and multilingual models) and sparse retrieval using BM25 algorithm with language-specific tokenizers. Performance was evaluated across different top-k values, leading to optimized ensemble retrievers for each guideline. Results: For dense retrievers, Upstage’s Solar Embedding-1-large and OpenAI’s text-embedding-3-large showed superior performance for Korean and English guidelines, respectively. Multilingual models outperformed language-specific models in both cases. For sparse retrievers, the ko_kiwi tokenizer demonstrated superior performance for Korean text, while both ko_kiwi and porter_stemmer showed comparable effectiveness for English text. The ensemble retrievers, combining optimal dense and sparse configurations, demonstrated enhanced coverage while maintaining precision. Conclusions: This study presents an effective dual RAG system that enhances LLM reliability in diabetes management across different languages. The successful implementation with both Korean and American guidelines demonstrates the system’s cross-regional capability, laying a foundation for more trustworthy AI-assisted healthcare applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.