Balancing Act: LLM Priors and Retrieved Information in RAG Systems

In the evolving landscape of AI, Large Language Models (LLMs) have emerged as powerful tools for generating human-like text. However, their reliance on internal knowledge, or “priors,” can lead to limitations in applications requiring up-to-date, accurate information. Retrieval Augmented Generation (RAG) systems aim to address this by augmenting LLMs with external knowledge retrieved from various sources. While RAG has shown promise in reducing hallucinations and providing current information, it also introduces new challenges, particularly when the retrieved content conflicts with the LLM’s internal knowledge. Two recent papers, “How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs’ internal prior” and “Corrective Retrieval Augmented Generation,” explore these challenges and propose promising solutions to enhance the reliability and robustness of RAG systems.

The Tug-of-War: LLM Priors vs. Retrieved Information

The first paper, by researchers from Stanford University, investigates the tension between an LLM’s internal knowledge and the information retrieved by RAG systems. When these two sources of information conflict, how do LLMs handle the discrepancy? The authors conducted a systematic analysis of this interaction, testing GPT-4 and other LLMs on question-answering tasks across various datasets. By introducing controlled perturbations to the reference documents, they quantified how LLMs prioritize retrieved information over their internal knowledge.

The results were revealing. The probability of an LLM adhering to retrieved information was found to be inversely correlated with its confidence in its own prior knowledge. In other words, the more certain an LLM is about its internal knowledge, the less likely it is to rely on the retrieved information. Furthermore, LLMs tend to revert to their prior beliefs when the retrieved information significantly deviates from their internal knowledge. This finding highlights the importance of carefully evaluating the quality and accuracy of the retrieved content in RAG systems.

The way we prompt LLMs can significantly influence how closely they stick to retrieved information. When prompted to strictly adhere to the retrieved content, LLMs show a much greater preference for using it. In contrast, a more relaxed prompting style leads to less reliance on retrieved information, especially when the LLM is already confident in its own knowledge.

Corrective Retrieval Augmented Generation

The second paper, by researchers from Google Research, UCLA, and the University of Science and Technology of China, introduces Corrective Retrieval Augmented Generation (CRAG), an approach designed to improve the robustness and reliability of RAG systems. CRAG addresses the heavy reliance of RAG systems on the accuracy and relevance of retrieved documents, which can lead to errors, misinformation, and hallucinations in the generated outputs of LLMs.

CRAG employs several key strategies to enhance the robustness of RAG systems. First, it uses a lightweight evaluator to assess the relevance of retrieved documents and assign a confidence score, triggering one of three actions: Correct, Incorrect, or Ambiguous. For documents deemed “Correct,” a decompose-then-recompose algorithm extracts and filters key information, removing irrelevant segments to optimize knowledge utilization. When documents are deemed “Incorrect,” CRAG resorts to web searches as a corrective measure, expanding the knowledge base and offering more diverse information sources. Finally, the “Ambiguous” action combines both refined internal knowledge and external knowledge from web searches, providing a balanced approach when confidence in the initial retrieval is low.

CRAG at inference: A retrieval evaluator assesses document relevance to the input and estimates confidence to trigger knowledge retrieval actions: Correct, Incorrect, or Ambiguous.
Practical Implications and Applications

The findings of these two papers have significant implications for the development and deployment of RAG-powered applications. Developers should be aware of the limitations of LLMs and recognize that RAG systems do not guarantee perfect adherence to provided information, especially when it conflicts with the LLM’s prior knowledge. The choice of prompting technique can also significantly impact how LLMs balance their internal knowledge with retrieved information, necessitating careful prompt design.

CRAG offers a plug-and-play solution to enhance the accuracy and reliability of generated texts by addressing inaccuracies in retrieved knowledge and mitigating the impact of incorrect retrievals on generative models. Teams building applications and solutions backed by RAG and LLMs can leverage CRAG to improve the overall performance and reliability of their systems.

LLMs are less likely to follow retrieved information when they are more confident in their existing knowledge

In addition, teams have explored the use of knowledge graphs to further enhance RAG systems. Knowledge graph-guided retrieval leverages the structured information in knowledge graphs (KGs) to guide the retrieval of relevant passages for RAG models. This approach has shown promising results, particularly in improving the performance of question answering and information summarization tasks. By incorporating domain-specific knowledge through KGs, RAG systems can generate more accurate and contextually relevant outputs.

The Future of RAG: Next Steps and Challenges

Both papers identify key areas for future RAG research. These include expanding investigations into diverse domains and applications, simulating real-world errors, and enhancing model trustworthiness and reliability. Additionally, a deeper understanding of LLM behavior, handling complex data types and sensitive domains, and aligning user expectations with RAG capabilities are crucial for responsible development and deployment.

Knowledge graph integration could prove to be a significant aspect of this progression. Teams are exploring automated knowledge graph construction from unstructured data to support RAG systems, particularly where pre-existing knowledge graphs are lacking. This facilitates access to structured information for more effective and precise retrieval and processing.

Collaboration and knowledge sharing remain essential as the RAG field evolves. By building upon existing research and addressing current challenges, we can unlock the full potential of RAG systems and create more reliable, accurate, and trustworthy AI applications.

Conclusion

This article explores two papers that advance our understanding of RAG systems and their interaction with LLMs. The research paves the way for more robust and reliable RAG applications by quantifying the interplay between LLM priors and retrieved information and introducing novel corrective strategies. As we explore the potential of these technologies, it’s crucial to address limitations and enhance accuracy, reliability, and ethical use. The insights provided offer a valuable foundation for developers, guiding better integration strategies and user interface design to indicate answer confidence and reliability. I highly recommend reading both papers in their entirety to gain a deeper understanding of the challenges and opportunities in the use of RAG.

Related Content:


If you enjoyed this post please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Discover more from Gradient Flow

Subscribe now to keep reading and get access to the full archive.

Continue reading