Domain-specific chatbots for science using embeddings

Yager, Kevin G.

doi:10.1039/d3dd00112a

Cited by 13 publications

(2 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Search functionalities also leverage vector embeddings extensively. An example is Google's reverse image search, where images are transformed into vector representations that allow for efficient and accurate retrieval based on visual similarities [22]. This method applies not only to images but also to textual content, where search engines employ vector embeddings to improve the relevance and precision of search results.…”

Section: Source: Compiled By the Authorsmentioning

confidence: 99%

Effective documentation practices for enhancing user interaction through GPT-powered conversational interfaces

Sheremet,

Sadovoi,

Sheremet

et al. 2024

AAIT

View full text Add to dashboard Cite

The article presents a detailed overview of the integration of ChatGPT with PDF documents using the LangChain infrastruc-ture, highlighting significant advances in natural language processing and information retrieval. This approach offers the advantage of not being limited to working exclusively with PDF documents. By leveraging the special capabilities of the LangChain infrastruc-ture, it is possible to interact with any data files containing text information. The literature review highlights the transformative im-pact of OpenAI's GPT series of models on natural language processing, with advancements in GPT-4 significantly enhancing the generation of human-like text and setting new standards for interactive artificial intelligenceapplications. The analysis of OpenAI's application programming interface demonstrates its significant role in advancing the integration of artificial intelligence into various applications by providing accessible and robust tools that enable developers and enterprises to seamlessly incorporate sophisticated artificial intelligence functionalities. Despite their advantages, these interfaces face challenges such as latency, processing capacity limitations, and ethical considerations, which necessitate strategic implementation and continuous evaluation to fully harness their potential. The article examines the role of vector data representations, particularly vector embeddings, in enhancing the functionality of artificial intelligence and machine learning systems. These embeddings transform complex textual data into high-dimensional numerical formats, enabling artificial intelligence models to perform tasks such as language understanding, text generation, and data analysis with increased precision and depth. Vector databases play a critical role in managing and leveraging high-dimensional data, specifically vector embeddings, to enhance the operational efficiency of large language models. These specialized storage systems are optimized for handling complex data representations, enabling advanced applications such as text summarization, translation, and question-answering with high accuracy and contextual understanding. LangChain provides a versatile framework that bridges large language models and diverse data sources by utilizing vector databases. This integration enhances the AI's capabilities in data analy-sis and natural language processing, enabling sophisticated applications that can efficiently interpret and respond to user queries across various datasets. Developing a comprehensive application using LangChain and ChatGPT for PDF document interaction re-quires meticulous technical considerations. Key elements include efficient data management through LangChain's data loaders and text splitters, which transform PDFs into manageable formatsand ensure coherent segmentation for accurate AI interaction. Addi-tionally, implementing vector embeddings enhances the AI's ability to comprehend and analyze textual data, while a user-friendly interface and robust security measures ensure optimal user engagement and data protection. The practical implications of this tech-nology are significant, with potential improvements in customer support by reducing resolution times by up to 40%, streamlining academic literature reviews by approximately 60%, and boosting productivity in data analysis by saving an estimated 50% of the time spent on manual data extraction.

show abstract

Section: Source: Compiled By the Authorsmentioning

confidence: 99%

Effective documentation practices for enhancing user interaction through GPT-powered conversational interfaces

Sheremet,

Sadovoi,

Sheremet

et al. 2024

AAIT

View full text Add to dashboard Cite

show abstract

“…Purely generative methods in chatbot development can have drawbacks, including hallucinations, a lack of explainability, biases, and difficulties in verifying model-generated information [13]. In academic settings, these shortcomings are considered unacceptable, and approaches such as specific prompt engineering, fine-tuning, and document embedding have been proposed to mitigate hallucinations and ensure that the model adheres to the given context [14,15]. Recent advancements, including LoRA [16] and prompt-tuning [17], as well as user-friendly frameworks such as the LLM-Adapters developed by [12] or HuggingFace's PEFT library [18], have made fine-tuning more efficient and accessible to researchers with limited computational resources.…”

Section: Review Of Chatbotsmentioning

confidence: 99%

Knowledge-Based and Generative-AI-Driven Pedagogical Conversational Agents: A Comparative Study of Grice’s Cooperative Principles and Trust

Wölfel,

Shirzad,

Reich

et al. 2023

BDCC

View full text Add to dashboard Cite

The emergence of generative language models (GLMs), such as OpenAI’s ChatGPT, is changing the way we communicate with computers and has a major impact on the educational landscape. While GLMs have great potential to support education, their use is not unproblematic, as they suffer from hallucinations and misinformation. In this paper, we investigate how a very limited amount of domain-specific data, from lecture slides and transcripts, can be used to build knowledge-based and generative educational chatbots. We found that knowledge-based chatbots allow full control over the system’s response but lack the verbosity and flexibility of GLMs. The answers provided by GLMs are more trustworthy and offer greater flexibility, but their correctness cannot be guaranteed. Adapting GLMs to domain-specific data trades flexibility for correctness.

show abstract

The Heuristic Design Innovation Approach for Data-Integrated Large Language Model

Zhou,

Zhang,

Chen

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Domain-specific chatbots for science using embeddings

Abstract: We demonstrate how large language models (LLMs) can be adapted to domain-specific science topics by connecting them to a corpus of trusted documents.

Cited by 13 publications

References 76 publications

Effective documentation practices for enhancing user interaction through GPT-powered conversational interfaces

Effective documentation practices for enhancing user interaction through GPT-powered conversational interfaces

Knowledge-Based and Generative-AI-Driven Pedagogical Conversational Agents: A Comparative Study of Grice’s Cooperative Principles and Trust

The Heuristic Design Innovation Approach for Data-Integrated Large Language Model

Contact Info

Product

Resources

About