Retrieval-Augmented Natural Language Generation (RAG) System
Jump to navigation
Jump to search
A Retrieval-Augmented Natural Language Generation (RAG) System is an LLM-based system that implements a RAG technique to solve a RAG task.
- Context:
- It can (typically) implement Retrieval-based Approaches.
- It can (typically) make use of Generative Approaches for more accurate and information-rich responses.
- It can be referenced by an RAG System Architecture, composed of RAG system components (LLM-based system components), such as an LLM-based System Knowledge Component.
- It can be based on a RAG Platform.
- ...
- Example(s):
- a RAG-based Chatbot.
- a RAG-based AI Agent.
- Grouped by application:
- In Customer Support:
- A chatbot that retrieves product information from a database to generate responses to customer queries.
- An online shopping assistant that pulls user reviews and product details to inform potential buyers.
- In Question Answering Systems:
- Facebook's DPR (Dense Passage Retrieval) used in open-domain question answering.
- A medical information system that retrieves clinical data to answer patient inquiries.
- In Educational Tools:
- A tutoring system that retrieves educational content to explain complex subjects dynamically.
- An interactive learning platform that uses historical data to generate contextual quizzes.
- In Customer Support:
- ...
- Counter-Example(s):
- ...
- See: LLM-based QA System.
References
2024b
- (Barnett et al., 2024) ⇒ Scott Barnett, Stefanus Kurniawan, Srikanth Thudumu, Zach Brannelly, and Mohamed Abdelrazek. (2024). “Seven Failure Points When Engineering a Retrieval Augmented Generation System.” doi:10.48550/arXiv.2401.05856
- NOTES:
- RAG-based Systems involve two key processes: document indexing and query processing. Document indexing includes document chunking, embedding generation, and vector index creation. Query processing involves query embedding, relevant document retrieval, and answer generation using the retrieved documents and an LLM.
- The effectiveness of a RAG-based System depends on several factors, including the quality and relevance of the indexed documents, the performance of the retrieval system, and the ability of the LLM to generate accurate and coherent answers using the retrieved content.
- Testing and monitoring RAG-based Systems can be challenging due to their reliance on LLMs and the difficulty in predicting all possible user queries. Continuous system calibration and monitoring during operation are crucial for ensuring the system's robustness and reliability.
- Potential areas for further research on RAG-based Systems include exploring different document chunking and embedding strategies, comparing the performance of RAG approaches with finetuning methods, and developing best practices for software engineering and testing of these systems.
- NOTES:
2024a
- (Gao et al., 2024) ⇒ Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. (2024). “Retrieval-Augmented Generation for Large Language Models: A Survey.” doi:10.48550/arXiv.2312.10997
2023
- https://medium.com/@abdullahw72/langchain-chatbot-for-multiple-pdfs-harnessing-gpt-and-free-huggingface-llm-alternatives-9a106c239975
- QUOTE:
- SUMMARY:
- Architecture
- User Interface: Allows users to input questions and view responses.
- Natural Language Understanding (NLU): Understands and interprets user queries. Uses NLP techniques like tokenization and named entity recognition.
- Vector Store: Stores vector representations of text chunks extracted from PDFs. Generated using embeddings.
- Embeddings: Encode semantic information from text chunks into vectors. Can use OpenAI or Hugging Face models.
- Large Language Models (LLMs): Provide advanced language capabilities. Fine-tuned models like those from OpenAI or Hugging Face's Instruct series.
- Conversational Retrieval: Matches user query vectors to document vector representations to find relevant information.
- Chat History: Stores conversation context to enable coherent, relevant responses.
- Implementation
- Uses Python lang with Streamlit, PyPDF2, Langchain and other libraries.
- Key steps: Extract PDF text, split into chunks, generate embeddings, create vector store, set up conversation chain and chat model, process user input.
- Supports OpenAI and Hugging Face models for text embedding system and LLMs.