2024 GraphRAGUnlockingLLMDiscoveryon

(Larson & Truitt, 2024) ⇒ Jonathan Larson, and Steven Truitt. (2024). “GraphRAG: Unlocking LLM Discovery on Narrative Private Data.”

Subject Headings: GraphRAG Technique.

Notes

Cited By

http://scholar.google.com/scholar?q=%222024%22+GraphRAG%3A+Unlocking+LLM+Discovery+on+Narrative+Private+Data

Quotes

Abstract

Perhaps the greatest challenge – and opportunity – of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen. This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context and grounding on datasets. In this post, we introduce GraphRAG, created by Microsoft Research, as a significant advance in enhancing the capability of LLMs.

Retrieval-Augmented Generation (RAG) is a technique to search for information based on a user query and provide the results as reference for an AI answer to be generated. This technique is an important part of most LLM-based tools and the majority of RAG approaches use vector similarity as the search technique. GraphRAG uses LLM-generated knowledge graphs to provide substantial improvements in question-and-answer performance when conducting document analysis of complex information. This builds upon our recent research, which points to the power of prompt augmentation when performing discovery on private datasets. Here, we define private dataset as data that the LLM is not trained on and has never seen before, such as an enterprise’s proprietary research, business documents, or communications. Baseline RAG[1] was created to help solve this problem, but we observe situations where baseline RAG performs very poorly. For example:

Baseline RAG struggles to connect the dots. This happens when answering a question requires traversing disparate pieces of information through their shared attributes in order to provide new synthesized insights.
Baseline RAG performs poorly when being asked to holistically understand summarized semantic concepts over large data collections or even singular large documents.

To address this, the tech community is working to develop methods that extend and enhance RAG (e.g., LlamaIndex (opens in new tab)). Microsoft Research’s new approach, GraphRAG, uses the LLM to create a knowledge graph based on the private dataset. This graph is then used alongside graph machine learning to perform prompt augmentation at query time. GraphRAG shows substantial improvement in answering the two classes of questions described above, demonstrating intelligence or mastery that outperforms other approaches previously applied to private datasets.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2024 GraphRAGUnlockingLLMDiscoveryon	Jonathan Larson Steven Truitt			GraphRAG: Unlocking LLM Discovery on Narrative Private Data						2024