2023 EmergingArchitecturesforLLMAppl
- (Bornstein & Radovanovic, 2023) ⇒ Matt Bornstein, and Rajko Radovanovic. (2023). “Emerging Architectures for LLM Application.”
Subject Headings: AI Agent Architecture.
Notes
Cited By
Quotes
Abstract
Large language models are a powerful new primitive for building software. But since they are so new - and behave so differently from normal computing resources - it’s not always obvious how to use them.
In this post, we’re sharing a reference architecture for the emerging LLM app stack. It shows the most common systems, tools, and design patterns we’ve seen used by AI startups and sophisticated tech companies. This stack is still very early and may change substantially as the underlying technology advances, but we hope it will be a useful reference for developers working with LLMs now.
The stack
Here’s our current view of the LLM app stack:
And here’s a list of links to each project for quick reference:
There are many different ways to build with LLMs, including training models from scratch, fine-tuning open-source models, or using hosted APIs. The stack we’re showing here is based on in-context learning, which is the design pattern we’ve seen the majority of developers start with (and is only possible now with foundation models).
...
At a very high level, the workflow can be divided into three stages:
- Data preprocessing / embedding: This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.
- Prompt construction / retrievall: When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.
- Prompt execution / inference: Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.
...
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2023 EmergingArchitecturesforLLMAppl | Matt Bornstein Rajko Radovanovic | Emerging Architectures for LLM Application | 2023 |