Introduction
Welcome to the Rig book! Inside you will find some useful tips and tricks to help you get started using the Rig library ecosystem, as well as some additional advice on building systems that leverage Large Language Models (“LLMs”, or “AI”) and getting started in the Rust for AI domain.
What are AI Agents, and why build them in Rust?
Functionally, AI Agents are loops that call a model provider (whether local or a third-party) by adding a prompt and acts on the response. Models can return a request to call a tool (usually represented as a function), which is a crucial part as to how this all works.
In most conventional AI agent frameworks, this means the following:
- Agents that receive tool call requests must be able to automatically execute the tool, then return the result back to the LLM
- If there are no tool calls and there’s only a text response, the agent returns the text response and optionally awaits further user input depending on the context the agent is used in.
- Agents may also be able to carry on until a given condition has been reached, rather than simply terminating at the text response. However, using this effectively may require usage of multi-agent architecture which this book will be covering.
Although the primary language to build these in has historically been Python via Langchain (with many, many web developers now using the likes of AI SDK and Mastra to do the work instead), the primary reason for building them in Rust is the same as building any other project in Rust: speed, reliability and developer experience. With LLMs starting to reach a plateau, the work has increasingly shifted to building systems for AI and engineering the context around agents, rather than just trying to outright improve the models.
Despite the primary bulk of work for an AI agent being API call-based (hence the term “GPT wrapper”), there is still much work to be done around optimising more computationally heavy aspects of agents. A non-exhaustive list of these use cases can be found below:
- LLM-assisted data pipelines (ie using an LLM as part of a data pipeline)
- Speech training for voice agents
- Preparing data for training a model
Who is this book for?
Rust for Applied AI (which we will refer to as “Rust for AI”) specifically is an extremely emergent domain that has been receiving increased amounts of interest recently as time has gone on and the wider developer community have started trying to adopt Rust for AI.
Eventually, we want this book to be useful for all kinds of developers. However at the moment for the purposes of scope, the people likely to benefit the most are the following demographics below:
Rust engineers
If you are using Rust in any capacity (whether as a hobbyist, or as part of a Rust company), this book is for you specifically. Using Rust, you can write AI systems without needing to context switch into Python while still getting the speed and reliability benefits that a traditional system in Rust depends on. This book will lay the foundations for you to be able to make a non-deterministic system that generates its outputs based on ground truths rather than prompting and hoping for the best.
AI Engineers who are trying Rust
If you are currently working with AI as part of your day job, now has never been a better time to invest into Rust for AI. Recently there has been a lot of grassroots movement on moving the frontier forward, with many passionate developers working on new projects. With regards to Rig, we’ve achieved production usage by external companies - not just startups on the edge, but well-known AI startups as well as larger, more traditional companies.
AI companies
If you are part of a company that specialises in AI, you are likely using Python or Typescript. Your team likely already knows how brittle and resource-intensive Python (and to a lesser extent, Typescript) can be in production. With so much momentum starting to build up in the Rust for AI space, being an early adopter means you can take full advantage of all the upcoming libraries and projects being written in Rust. Whether you just want to leverage an API like Meilisearch, importing a Rust module into your Python code as a PyO3 module, or you simply just want to write a humble internal CLI tool: Rust has it.
How to use this book
There’s no single “correct” path through this mdbook, and you shouldn’t feel pressured to read it cover-to-cover (so to speak!) in a linear order. Instead, treat it as a toolkit you can dip into depending on what you’re trying to build, debug, or understand.
The primary goal is to help you grow as a Rust developer working in applied AI - someone who not only knows the language, but can wield it effectively when dealing with models, pipelines, observability, concurrency, and real-world deployment concerns. Rig gives you the building blocks; this book shows you how to assemble and reason about them.
Some chapters explain concepts from first principles. Others walk through practical examples, patterns, and trade-offs drawn from real systems. Throughout, the focus is on clarity and applicability: why the tools exist, when to use them, and how to avoid the common pitfalls that crop up in AI-heavy Rust projects.
If you’re new to Rig, you may want to skim the introductory material before jumping into the deeper sections. If you’re already familiar with Rust and just want to see how Rig structures agents, pipelines, or telemetry, feel free to jump directly to the relevant chapters. Everything is written to stand on its own.
Above all, use this book in whatever way helps you build better, more reliable AI systems in Rust - whether that means reading it linearly, cross-referencing specific patterns, or treating it as a reference you return to as your projects evolve.
Calling Model Providers with Rig
The full example for this section can be found on the GitHub repo.
Let’s get started by writing your first API call to a model provider. While Rig does have support for local models (through ollama, LM Studio through OpenAI Chat Completions, as well as other local providers), the majority of practical applications are through third party providers.
To get started, create a new project:
cargo init my-first-project
cd my-first-project
Next, we’ll add some required dependencies:
cargo add rig-core@0.25.0 tokio -F tokio/macros,rt-multi-thread
Now we’re ready to write some code!
Provider clients
You can create a provider client in one of two ways:
Client::new()takes the API key directlyClient::from_env()- attempts to use environment variables for API keys and will panic on none being provided
An example can be found below:
use rig::providers::openai::Client;
#[tokio::main]
async fn main() {
/// uses `OPENAI_API_KEY` environment variable
let openai_client = Client::from_env();
}
Agents
To create an agent, you’ll need to create it using the client we just made. For now we’ll call our agent Bob.
use rig::client::{CompletionClient, ProviderClient};
use rig::completion::Prompt;
use rig::providers::openai::Client;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let openai_client = Client::from_env(); // method provided by the ProviderClient trait
let agent = openai_client
.agent("gpt-5") // method provided by CompletionClient trait
.preamble("You are a helpful assistant.")
.name("Bob") // used in logging
.build();
let prompt = "What is the Rust programming language?";
println!("{prompt}");
let response_text = agent.prompt(prompt).await?; // prompt method provided by Prompt trait
println!("Response: {response_text}");
Ok(())
}
Running this snippet should return logs that look something like this:
What is the Rust programming language?
Response: Rust is a modern, statically typed systems programming language focused on safety, speed, and concurrency without a garbage collector. It was started at Mozilla and is now stewarded by the Rust Foundation; Rust 1.0 shipped in 2015.
Key ideas and features:
- Memory safety by design: ownership, borrowing, and lifetimes checked at compile time prevent use-after-free, null/dangling pointers, and data races in “safe Rust.”
- Zero-cost abstractions: high-level features (traits, enums/ADTs, pattern matching, iterators) compile down to code as efficient as hand-written C/C++ in most cases.
- No GC, RAII-based resource management; “unsafe” escape hatch for low-level control when needed.
- Strong tooling: cargo (build/test/deps), crates.io (packages), rustup (toolchains), rustfmt, clippy, built-in docs and testing.
- Concurrency: fearless concurrency with Send/Sync types; async/await available via libraries (e.g., Tokio).
- Cross-platform and interoperable: great C FFI, targets from embedded to WebAssembly.
Common uses:
- Systems software (OS components, drivers), networking services, CLI tools, embedded/IoT, game and graphics engines, crypto, and WebAssembly apps.
Trade-offs:
- Steeper learning curve (ownership/borrow checker) and longer compile times compared to some languages, though both improve with experience and tooling.
Like all other agentic frameworks, agents in Rig at their core are simply LLM calls that can handle tool calling for you (and the tools can make use of environment variables, database connection pools, etc…). This eliminates a lot of the boilerplate for needing to write your own agentic loops as they are considered a staple of building AI systems.
flowchart TD
U[User query]
AGENT[Agent]
PLAN[Plan or Reason]
DECIDE{Tool needed?}
TOOL[Tool call]
OBS[Execute tool and return tool result]
RESP[Final output]
U --> AGENT
AGENT --> PLAN
PLAN --> DECIDE
DECIDE -->|No| RESP
DECIDE -->|Yes| TOOL
TOOL --> OBS
OBS --> AGENT
Streaming
Streaming responses with agents is quite simple! To do so, instead of using prompt(), you use stream_prompt() instead:
#![allow(unused)]
fn main() {
use futures::Stream;
let mut stream = agent.stream_prompt(prompt).await;
while let Some(item) = stream.next().await {
// .. do some work here
}
}
Calling completion models directly
Calling completion models is also relatively simple.
To do so, you’ll need to import the CompletionsClient trait (from the client module) and use the method.
#![allow(unused)]
fn main() {
/// the completion model trait is provided through the CompletionsClient trait!
use rig::client::CompletionsClient;
let openai_client = Client::from_env();
let openai_completions_model = openai_client.completion_model("gpt-5");
}
Next, there’s two ways you can call a completion request through the CompletionModel trait.
The first one is creating a CompletionRequest and calling CompletionModel::completion(). You can see the code below:
#![allow(unused)]
fn main() {
//NOTE: OneOrMany is an abstraction that ensures there's always at least one element
let message = Message::User {
content: OneOrMany::one(UserContent::text("What is the Rust programming language?"))
};
let req = CompletionRequest {
messages: OneOrMany::one(message),
premamble: Some("You are a helpful assistant.".to_string()),
..Default::default()
};
let response = openai_completions_model.completion(req).await?;
}
You can also CompletionModel::completion_request with the prompt text we want to use, then using the builder methods:
#![allow(unused)]
fn main() {
let response = openai_completions_model
.completion_request("What is the Rust programming language?")
.preamble("You are a helpful assistant")
.send()
.await?;
}
Should I use agents or direct completion models?
If you just want a way to prompt a model and don’t care about the specifics of the contents of the completion model itself, use rig::agent::Agent. Agents will also automatically handle tool calling for you, meaning that you can avoid writing all the boilerplate that this would typically require.
If you need more low level control over the agentic loop than what Rig requires, using the completion model manually is often more effective. One use case for this is that you may want to decide whether or not to simply return a tool result or use the result in the next prompt based on a given tool result.
Retrieval-Augmented Generation (RAG)
The full example for this section can be found on the GitHub repo.
What is RAG?
Retrieval Augmented Generation (or RAG) retrieves relevant documents from a data store based on a given query and includes them in an LLM prompt to assist with grounding a response in factual information. The goal of doing this is to reduce hallucinations and include data that may be missing from a model’s training data (or that it cannot retrieve from a web search).
How is RAG carried out?
Before we talk about RAG, there are two very important concepts to cover: embeddings, and cosine similarity.
Embeddings are numerical representations (“vectors”) that carry semantic meaning. They are created through using embedding models, whether local or from a model provider like OpenAI or Gemini. By doing this, we can mathematically measure semantic similarity between texts - ie, how related they are.
Cosine similarity is the default metric that is used to calculate semantic similarity, with a higher score meaning more semantic similarity. It is quite important within RAG as it is very easy to use to find similarly related documents, and is additionally used in recommender and hybrid/semantic search systems.
RAG starts with document ingestion. Developers will split up documents to be ingested with a chunking strategy (typically using fixed token sizes like 512-1000, or semantic boundaries like paragraphs) for more focused chunks, then embed each chunk and insert the embeddings into a vector store. A vector store can be a database with a vector search plugin (like pgvector), or a vector database. Each chunk’s metadata is also stored alongside the embedding - the metadata from vector search results will then be included in LLM prompts.
When the user asks a question, the system will embed the query (again using an embedding model). The query will need to use the exact same model as what was used to embed the documents, as it is impossible to carry out similarity calculations on vectors generated from two different models. Although you can use other metrics for your RAG system, cosine similarity is the default as it is usually the most applicable to a given vector search for RAG.
This then gets passed into the vector search for whatever vector store they’re using and should output some resulting relevant documents according to the query, along with their metadata payloads.
Do I need RAG?
If you are writing a customer support bot or chatbot that relies on some documentation that you (or the company you work for) possess, then RAG is absolutely vital. Being able to ground the LLM in the most up-to-date information rather than hoping it’ll somehow magically remember the right information is crucial - if you don’t use RAG for this kind of use case, not doing so can carry reputational risk.
However if you are using an LLM for simple data classification tasks for which the categories of data are already well known (ie, “is this a cat or a dog?”), you probably don’t need RAG.
RAG with Rig
Rig provides built-in support for RAG through two ways:
- the
VectorStoreIndextrait for fetching documents - the
InsertDocumentstrait for inserting documents
Rig integrations all primarily use cosine similarity as the method of measuring how similar two documents are.
Supported Vector Stores
By default, Rig has an in-memory vector store that is ideal for development and small-scale applications without any external dependencies.
However, if you’d like to use a durable vector store (or simply want to take Rig to production!), here are some of the vector stores we support:
- HelixDB
- LanceDB
- Milvus (both locally and through Zilliz Cloud)
- MongoDB
- Neo4j
- PostgresQL
- Qdrant
- ScyllaDB
- Sqlite
- SurrealDB
Basic example
Here’s a basic example of setting up RAG with Rig:
use rig::{
client::{EmbeddingsClient, ProviderClient},
embeddings::EmbeddingsBuilder,
providers::openai::{Client, TEXT_EMBEDDING_ADA_002},
vector_store::{VectorSearchRequest, VectorStoreIndex, in_memory_store::InMemoryVectorStore},
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let openai_client = Client::from_env();
let mut vector_store = InMemoryVectorStore::default();
// Define documents to index
let documents = vec![
"Rig is a Rust library for building LLM-powered applications.",
"RAG combines retrieval and generation for better accuracy.",
"Vector stores enable semantic search over documents.",
];
let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);
// Create embeddings and add to vector store
let embeddings = EmbeddingsBuilder::new(model.clone())
.documents(documents)?
.build()
.await?;
vector_store.add_documents(embeddings);
// Create a vector index from the in-memory vector store
let vector_idx = vector_store.index(model);
let query = VectorSearchRequest::builder()
.query("What is Rig?")
.samples(2)
.build()?;
// Query the vector store
let results = vector_idx.top_n::<String>(query).await?;
for (score, doc_id, doc) in results {
println!("Score: {}, ID: {}, Content: {}", score, doc_id, doc);
}
Ok(())
}
Once you’ve fetched your document, you can then use it in an LLM completion call:
#![allow(unused)]
fn main() {
let documents: Vec<Document> = results.into_iter().map(|(score, id, doc)| {
Document {
id,
text: doc,
additional_props: std::collections::HashMap::new() // whatever extra properties you want here
}
}).collect();
let req = CompletionRequest {
documents,
// fill in the rest of the fields here!
}
}
Using RAG with Agents
Rig makes it straightforward to equip agents with RAG capabilities. You can attach a vector store to an agent, and it will automatically retrieve relevant context before generating responses:
use rig::{
agent::Agent,
completion::Prompt,
providers::openai::{Client, GPT_4},
vector_store::in_memory_store::InMemoryVectorStore,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let openai_client = Client::from_env();
// Set up vector store with your documents (as shown above)
let vector_store = InMemoryVectorStore::default();
// Create an agent with RAG capabilities
let agent = openai_client
.agent(GPT_4)
.preamble("You are a helpful assistant that answers questions using the provided context.")
.dynamic_context(2, vector_store) // Retrieve top 2 relevant documents
.build();
// Use the agent
let response = agent
.prompt("What is Rig and how does it help with LLM applications?")
.await?;
println!("Agent response: {}", response);
Ok(())
}
The dynamic_context method configures the agent to automatically retrieve the specified number of relevant documents for each query and include them in the context sent to the language model.
Modern RAG Patterns
Since its inception, RAG has grown a great deal both in terms of complexity and ways that you can build on the basic pattern.
Re-ranking
Re-ranking is simple: re-order initial search results for better relevance, accuracy and context understanding. By using re-ranking models, it allows search results to be scored more deeply than vector search which allows for a dramatic improvement in effectiveness.
To get started with re-ranking, the fastembed Rust crate has a TextRerank type that can help you re-rank your search results.
Hybrid search
While semantic search is crucial to RAG, because it is not the same as full-text search it can sometimes miss out on results that might actually contain the target term but may not be considered as relevant. Hybrid search solves this: by combining full-text search and semantic search, you can combine both together.
Hybrid search is a bit more complicated to showcase, but to carry it out: store the documents in a regular database (as well as a vector store), then query both your database and vector store at retrieval time and combine the two lists of results using something like Reciprocal Rank Fusion or weighted scoring.
Use cases for RAG
Retrieval augmented generation is quite a broad topic with an almost infinite amount of use cases. However for our purposes, there are a few cases where it really shines - which we’ll discuss below.
Documentation Q&A
One of the classic use cases for RAG is chunking a PDF document and asking an LLM questions about the document. This allows for a chatbot-like workflow and mitigates needing to read the entire document.
Memory
RAG can serve as a useful basis of information retrieval for agentic memory. With RAG, it allows you to store any kind of document. This means you can store conversation summaries, things that your users (or your company!) may want you to remember, as well as the more typical facts and chunked documents.
Tool RAG
By storing tool definitions in a vector store, we can also conditionally fetch tool definitions using RAG. This is hugely important as modern agents often need to keep large lists of tools, which can cause hallucinations and degraded model output due to a finite context window. Using tool RAG can also save on token costs, as sending large lists of tools to a model provider can also incur a large token cost over time.
Rig supports this out of the box in the builder type:
#![allow(unused)]
fn main() {
let tools = some_toolset(); // this function is pseudo-code and simply represents the toolset
let in_memory = InMemoryVectorStore::default();
let agent = openai_client.agent("gpt-5")
.dynamic_tools(2, in_memory, some_toolset)
.build();
}
The dynamic_tools function will store the maximum number of results to return, the toolset as well as the vector store. At context assembly time, the agent will attempt to use RAG to fetch some relevant tool definitions to send to the LLM. The called tools will be executed from the toolset.
Limitations of RAG
While RAG has become a cornerstone of LLM application development, it also comes with challenges that require careful design to be overcome. Below is a non-exhaustive list of common issues and how to fix them:
- Relevant information can be split across multiple chunks. For this, solutions include overlapping chunks (where a 10-20% overlap captures context at the boundaries), or parent-child chunking where you retrieve small chunks but provide original larger chunks to the LLM
- The information you ingest may have contradictory information - you can fix this with filtering by metadata to retrieve only relevant data, using recency-based weighting and considering the source authority (where official docs should be prioritised over community forums, for example).
- Over time, the data stored in your vector store can become stale and outdated. You can fix this with using
created_atandlast_updatedfields, using versioning and TTL mechanisms as well as monitoring for changes in the source data and triggering re-embedding.
Tool calling
The full example for this section can be found on the GitHub repo.
What is tool calling?
Tool calling (also called “function calling”) essentially involves the LLM sending a prompt response containing a content part called a “tool call”, using a tool that you have defined as part of your request. Your agent then executes the tool, typically represented as a function, then sends the tool result back to the LLM. The LLM will then use the tool result to generate a response that uses the tool result.
Tools are core to the agentic loop and what differentiates agents from “just an LLM call”. By using tool calls, you can turn an LLM into a fully-fledged system that can autonomously execute tasks.
Do I need tools?
If your LLM application needs to interact with your infrastructure (for example, your Postgres database) or make calculations independently of the other application logic, then you absolutely need tools!
If you’re building a simple chat application or using an LLM for data classification tasks though, probably not. These kinds of tasks rely typically more on the raw conversational ability of an LLM, rather than interacting with an agent’s external environment.
Tool calling in Rig
Functionally, tools in Rig can be defined as types that implement the rig::tool::Tool trait.
A simple example would be be a tool that adds two numbers together:
#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};
#[derive(Deserialize)]
struct OperationArgs {
x: i32,
y: i32,
}
#[derive(Debug, thiserror::Error)]
#[error("Math error")]
struct MathError;
#[derive(Deserialize, Serialize)]
struct Adder;
impl Tool for Adder {
const NAME: &'static str = "add";
type Error = MathError;
type Args = OperationArgs;
type Output = i32;
async fn definition(&self, _prompt: String) -> ToolDefinition {
ToolDefinition {
name: "add".to_string(),
description: "Add x and y together".to_string(),
parameters: json!({
"type": "object",
"properties": {
"x": {
"type": "number",
"description": "The first number to add"
},
"y": {
"type": "number",
"description": "The second number to add"
}
},
"required": ["x", "y"],
}),
}
}
async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
println!("[tool-call] Adding {} and {}", args.x, args.y);
let result = args.x + args.y;
Ok(result)
}
}
}
You can also use the rig::tool_macro macro:
#![allow(unused)]
fn main() {
#[rig::tool_macro(
description = "Adds two numbers together".
)]
async fn add(x: i32, y: i32) -> i32 {
x + y
}
}
Practical usage
Agents in Rig allow you to simply add tools in the builder type:
#![allow(unused)]
fn main() {
let agent = openai_client.agent("gpt-5")
.tool(Adder)
.build();
}
Memory
The full example for this section can be found in the GitHub repo.
What is Memory?
Agentic memory (or just “memory”) in AI applications refers to the system’s ability to retain and utilize information from previous interactions. Without memory, each conversation with an LLM starts from scratch—the model has no awareness of what was discussed moments ago. Memory allows your AI application to maintain context across multiple exchanges, creating more coherent and personalized experiences.
The Use Case for Memory
Consider a customer support chatbot. Without memory, a user asking “What’s the status of my order?” followed by “Can you cancel it?” would leave the AI confused—it wouldn’t know which order to cancel. With memory, the conversation flows naturally, just as it would with a human agent.
Memory becomes critical in several scenarios:
- Multi-turn conversations where context builds over time
- Personalized interactions that adapt based on user preferences
- Task-oriented dialogues where the AI needs to track goals and progress
- Long-running sessions where conversations span multiple topics
The following diagram can be used to describe an agentic loop that uses memory. It attempts to retrieve relevant memories (with retrieval/filtering often in the same step), assembles context for our agent based on the memories and then finally once the full request is assembled the agent will then send the prompt to the LLM. Once a response has been retrieved, it might then check if memories need updating (and update accordingly).
graph TD
A[User Input] --> B[Agent Processing]
B --> C{Memory Operations}
C --> D[Short-term Memory]
C --> E[Long-term Memory]
C --> F[Episodic Memory]
D --> G[Current Context<br/>Active conversation]
E --> H[Semantic Knowledge<br/>Facts & concepts]
F --> I[Past Interactions<br/>Event sequences]
G --> J[Memory Retrieval]
H --> J
I --> J
J --> K[Relevance Filtering]
K --> L[Context Assembly]
L --> M[Response Generation]
M --> N[Action/Output]
N --> O[Memory Update]
O --> D
O --> E
O --> F
style D fill:#e1f5ff,color:#000
style E fill:#fff4e1,color:#000
style F fill:#f0e1ff,color:#000
style J fill:#e1ffe1,color:#000
style M fill:#ffe1e1,color:#000
Basic Memory Management in Rig
In Rig, conversation history is currently decoupled from the library - which is to say that you need to implement it yourself. The simplest form of memory management is storing a conversation history as a Vec<T>. Each exchange between the user and the assistant is stored as a Message object:
#![allow(unused)]
fn main() {
use rig::message::Message;
let mut conversation_history: Vec<Message> = Vec::new();
}
As the conversation progresses, you append new messages to this vector:
#![allow(unused)]
fn main() {
// Add a user message
conversation_history.push(Message::User {
content: OneOrMany::one(
UserContent::text("Do you know what the weather is like today?")
)
});
// Add the assistant's response
conversation_history.push(Message::Assistant {
content: OneOrMany::one(
AssistantContent::text("I don't have access to real-time weather data...")
)
});
}
When making subsequent requests to the LLM, you include this history to maintain context. However, this approach has a fundamental limitation: conversation history grows indefinitely, eventually exceeding the model’s context window and increasing costs.
Managing Ephemeral Conversation Memory
To handle growing conversation histories, you’ll need a more sophisticated approach. Let’s create a ConversationMemory struct that can both manage messages and compact them when needed.
First, define the basic structure:
#![allow(unused)]
fn main() {
use rig::completion::Message;
pub struct ConversationMemory {
messages: Vec<Message>,
max_messages: usize,
summary: Option<String>,
}
impl ConversationMemory {
pub fn new(max_messages: usize) -> Self {
Self {
messages: Vec::new(),
max_messages,
summary: None,
}
}
}
}
Add methods for basic message management:
#![allow(unused)]
fn main() {
impl ConversationMemory {
pub fn add_user_message(&mut self, input: &str) {
let message = Message::User {
content: OneOrMany::one(UserContent::text(input))
};
self.messages.push(message);
}
pub fn add_assistant_message(&mut self, input: &str) {
let message = Message::Assistant {
content: OneOrMany::one(AssistantContent::text(input))
};
self.messages.push(message);
}
pub fn get_messages(&self) -> &[Message] {
&self.messages
}
pub fn clear(&mut self) {
self.messages.clear();
}
}
}
Although being able to clear and fetch/add messages is quite useful, will also need a way to compact the messages by generating a summary.
Typically there are a number of ways this can be done, but for the sake of simplicity we will hold a variable that has a number of maximum messages. If the message length passes the threshold, we clear the message list and ask the LLM to generate a summary.
#![allow(unused)]
fn main() {
use rig::completion::CompletionModel;
use rig::completion::Prompt;
impl ConversationMemory {
pub async fn compact<T>(&mut self, client: &T) -> Result<(), Box<dyn std::error::Error>>
where
T: CompletionModel
{
if self.messages.len() <= self.max_messages {
return Ok(());
}
// Create a prompt asking the LLM to summarize the conversation
let summary_prompt = format!(
"Please provide a concise summary of the following conversation, \
capturing key points, decisions, and context:\n\n{}",
self.format_messages_for_summary()
);
// Request the summary from the LLM
let response = client
.prompt(&summary_prompt)
.send()
.await?;
self.summary = Some(response);
self.messages.clear();
Ok(())
}
fn format_messages_for_summary(&self) -> String {
self.messages
.iter()
.map(|msg| format!("{}: {}", msg.role, msg.content))
.collect::<Vec<_>>()
.join("\n")
}
}
}
You can inject the summary back into the conversation in two ways. The first approach adds it to the system prompt:
#![allow(unused)]
fn main() {
impl ConversationMemory {
pub fn build_system_prompt(&self, base_prompt: &str) -> String {
match &self.summary {
Some(summary) => {
format!(
"{}\n\nPrevious conversation summary:\n{}",
base_prompt, summary
)
}
None => base_prompt.to_string(),
}
}
}
}
Alternatively, you can add the summary as a user message at the start of the conversation:
#![allow(unused)]
fn main() {
impl ConversationMemory {
pub fn get_message_summary(&self) -> Vec<Message> {
let mut messages = Vec::new();
if let Some(summary) = &self.summary {
messages.push(
Message::User {
content: OneOrMany::one(
UserContent::text(format!("Context from previous conversation:\n{}", summary).as_ref())))
}
)};
messages
}
}
}
In practice, this is how you’d use it:
#![allow(unused)]
fn main() {
let mut memory = ConversationMemory::new(10);
// Add messages as the conversation progresses
memory.add_message("user", "Tell me about Rust");
memory.add_message("assistant", "Rust is a systems programming language...");
// When the conversation grows too long, compact it
if memory.get_messages().len() > memory.max_messages {
memory.compact(&client).await?;
}
// Build your next request with the summary included
let messages = memory.get_messages_with_summary();
}
Although you can see here that the compaction is manual, there’s a lot of ways you can build around it: you can check to see if the window needs compacting after every message, you could compact it based on token limit (although this would need a tokenizer to count tokens).
Strategies for Long-Term Memory
While conversation compaction handles ephemeral memory, many applications need to retain information across sessions. Here are three strategies for managing long-term memory:
1. Conversation Observations
Conversation observations capture insights about specific exchanges. These might include:
- Important decisions made during the conversation
- Questions that remain unanswered
- Topics the user expressed strong interest in
Implementation approach:
#![allow(unused)]
fn main() {
pub struct ConversationObservation {
pub timestamp: DateTime<Utc>,
pub topic: String,
pub insight: String,
pub importance: f32,
}
}
After each significant exchange, use an LLM to extract observations:
#![allow(unused)]
fn main() {
let extraction_prompt = format!(
"Extract key observations from this conversation exchange. \
Focus on decisions, preferences, and important context:\n\n{}",
recent_messages
);
}
Store these observations in a database or vector store, retrieving the most relevant ones when starting new conversations.
2. User Observations
User observations track persistent information about the user themselves:
- Stated preferences (“I’m vegetarian”)
- Personal context (“I live in Seattle”)
- Communication style (“I prefer concise answers”)
- Long-term goals or projects
These observations should be:
- Maintained separately from conversation history
- Updated incrementally as new information emerges
- Verified before use to ensure they’re still accurate
Consider using a structured format:
#![allow(unused)]
fn main() {
pub struct UserProfile {
pub preferences: HashMap<String, String>,
pub context: Vec<String>,
pub communication_style: Option<String>,
pub last_updated: DateTime<Utc>,
}
}
Periodically ask the LLM to extract user observations from recent conversations:
#![allow(unused)]
fn main() {
let profile_prompt =
"Based on the recent conversations, extract any new information about \
the user's preferences, context, or communication style. Return only \
new or updated information.";
}
3. Grounded Facts
Grounded facts are verifiable pieces of information that emerged during conversations:
- External information retrieved during the session
- Calculations or analyses performed
- File contents or data processed
- API responses or database queries
These differ from observations because they’re objectively verifiable and often come from external sources rather than the conversation itself.
Store grounded facts with their source and timestamp:
#![allow(unused)]
fn main() {
pub struct GroundedFact {
pub fact: String,
pub source: String,
pub timestamp: DateTime<Utc>,
pub confidence: f32,
pub conversation_id: String,
}
}
When starting a new conversation, retrieve relevant facts:
#![allow(unused)]
fn main() {
// Retrieve facts related to the current conversation context
let relevant_facts = fetch_facts_by_relevance(
current_topic,
max_facts: 5,
)?;
// Include them in the system prompt or as initial messages
let context = format!(
"Relevant information from previous interactions:\n{}",
relevant_facts.join("\n")
);
}
Caching
Generally speaking, effective memory systems don’t just have a single data source that it pulls from. The most production-capable systems often have multiple tiers of memory storage to accommodate for different needs. Typically, one of these should allow for fast fetching of the most commonly fetched items - using a caching layer. Whether it’s in memory, through Redis or Memcache (or perhaps another service dedicating to caching), caching is highly versatile and agentic memory is no exception.
How to create caches and use them is already a very well-trodden topic, and as such a full implementation will not be provided here. However, there are some highly useful crates you can use to help create your own memory cache:
- lru - an implementation of a Least Recently Used cache
- slotmap - A data structure with low overhead for storing, inserting and deleting items
Combining Memory Strategies
The most effective AI applications combine multiple memory strategies:
- Ephemeral memory maintains immediate context through the conversation history and compaction
- Observations capture insights about the conversation and user over time
- Grounded facts preserve verified information that might be needed again
When building a new conversation context, you might:
- Load the user’s profile to personalize responses
- Retrieve relevant observations from past conversations
- Fetch grounded facts related to the current topic
- Maintain the current conversation in ephemeral memory
- Compact the conversation when it grows too large
This layered approach ensures your AI application has access to the right information at the right time, without overwhelming the model’s context window or your budget.
Model Routing
Model routing allows you to dynamically select different agents based on the incoming request. This enables you to build sophisticated systems where requests are intelligently routed to specialized agents. For example, if you run a sales team you might have a domain expert agent, an agent to assist with sales engineering and an agent to help you check in on your team’s performance and where they may need unblocking - all under one chat interface. By using model routing, we can effectively model a system that accurately categorises which agent the query should go to.
The full example for this section can be found on the GitHub repo.
Use cases for model routing
Model routing is a simple concept that simultaneously solves a lot of issues in AI systems by using it as the following:
- A guardrail layer (to make sure the model only responds to certain topics)
- Route to other models depending on topic, creating a multi-layered system
- You can also use it to route to a less or more expensive model (eg if you need a fallback model)
In terms of execution, there are two basic types of routers that are then expanded on. You can use a basic LLM decision layer that non-deterministically does the routing for you with no initial infrastructure setup required, or you can alternatively use something else like semantic routing with embeddings which is much more effective but also requires more setup.
Simple LLM routing
Below is a diagram which depicts the data flow for an LLM-based router:
flowchart TD
User[User] -->|Query| Decision[LLM Decision Layer]
Decision -->|Category A| AgentA[Agent A]
Decision -->|Category B| AgentB[Agent B]
AgentB --> Response
AgentA --> Response
LLM-based routers are quite easy to set up and can be used to validate an MVP very easily, but often come with drawbacks that would not be present in a semantic router.
First, let’s set up a barebones routing system that basically asks the LLM to return a single word based on what we ask it. This code snippet will do the following:
- Initialise two different agents representing possible “routes” on an LLM router
- Initialise a third agent that basically acts as a decision layer
- Prompt the third agent, then prompt one of the two routes depending on what was returned (returning an error if no suitable topic found)
use rig::{completion::Prompt, providers::openai};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize the OpenAI client
let openai_client = openai::Client::from_env();
// Create specialized agents
let coding_agent = openai_client
.agent("gpt-5")
.preamble("You are an expert coding assistant specializing in Rust programming.")
.build();
let math_agent = openai_client
.agent("gpt-5")
.preamble("You are a mathematics expert who excels at solving complex problems.")
.build();
let router = openai_client
.agent("gpt-5-mini") // we can afford to use a less expensive model here as the computation required is significantly less
.preamble("Please return a word from the allowed options list,
depending on which word the user's question is more closely related to. Skip all prose.
Options: [\'rust', 'maths']
")
.build();
let prompt = "How do I use async with Rust?";
let topic = router.prompt(prompt).await?;
println!("Topic selected: {topic}");
let res = if topic.contains("math") {
math_agent.prompt(prompt).await?
} else if topic.contains("rust") {
coding_agent.prompt(prompt).await?
} else {
return Err(format!("No route found in text: {topic}").into());
};
println!("Response: {res}");
Ok(())
}
While this does work, in a production use case you may want to make the implementation more resilient by doing the following, which we’ll cover below:
- Using embeddings for more control over the router topic selection (no potentially wildly wrong text!)
- Using the Rust type system to properly type our agents, as well as covering dynamic dispatch for agent routing
Router Type-safety
Instead of just having all our agents in one function let’s imagine instead that we have a router that can take any kind of agent (from a given provider).
#![allow(unused)]
fn main() {
type OpenAIAgent = Agent<rig::providers::openai::ResponsesCompletionModel>;
struct TypedRouter {
routes: HashMap<String, OpenAIAgent>,
}
impl TypedRouter {
pub fn new() -> Self {
Self {
routes: HashMap::new(),
}
}
pub fn add_route(mut self, route_loc: &str, agent: OpenAIAgent) -> Self {
self.routes.insert(route_loc.to_string(), agent);
self
}
pub fn fetch_agent(&self, route: &str) -> Option<&OpenAIAgent> {
self.routes.get(route)
}
}
}
At the cost of only being able to use agents from a single provider, this makes it quite easy to create simple routers for non-complex use cases. For example: tasks that require simple data classification and a canned answer vs. a full LLM response might use gpt-5-mini and GPT-5.2, respectively.
Advanced Router with Embeddings
For more sophisticated routing, use embeddings to semantically match queries to agents:
#![allow(unused)]
fn main() {
use rig::{
completion::Prompt,
embeddings::EmbeddingsBuilder,
providers::openai,
vector_store::in_memory_store::InMemoryVectorStore,
};
use serde::{Deserialize, Serialize};
#[derive(Clone, Default, Serialize, Deserialize, Eq, PartialEq)]
struct RouteDefinition {
name: String,
description: String,
examples: Vec<String>,
}
async fn create_semantic_router(
openai_client: &openai::Client,
) -> Result<InMemoryVectorStore<RouteDefinition>, Box<dyn std::error::Error>> {
let routes = vec![
RouteDefinition {
name: "coding".to_string(),
description: "Programming, code, and software development".to_string(),
examples: vec![
"How do I write a function?".to_string(),
"Debug this code".to_string(),
"Implement a sorting algorithm".to_string(),
],
},
RouteDefinition {
name: "math".to_string(),
description: "Mathematics, calculations, and equations".to_string(),
examples: vec![
"Solve this equation".to_string(),
"Calculate the derivative".to_string(),
"What is 15% of 200?".to_string(),
],
},
];
let mut vector_store = InMemoryVectorStore::default();
for route in routes {
let embedding_text = format!(
"{}: {}. Examples: {}",
route.name,
route.description,
route.examples.join(", ")
);
let embedding = openai_client
.embeddings("text-embedding-ada-002")
.simple_document(&embedding_text)
.await?;
vector_store.add_document(route, embedding);
}
Ok(vector_store)
}
async fn semantic_route_query(
query: &str,
router: &InMemoryVectorStore<RouteDefinition>,
openai_client: &openai::Client,
) -> Result<String, Box<dyn std::error::Error>> {
let embedding_model = openai_client.embedding_model("text-embedding-ada-002");
let index = router.clone().index(embedding_model);
let req = VectorSearchRequest::builder()
.query(query)
.samples(1)
.build()?;
let results = index.top_n::<RouteDefinition>(req).await.unwrap();
// Currently speaking there's no similarity score threshold
// so this should always return at least one sample/result
let route_name = results
.first()
.map(|(_, _, route_def)| route_def.name.as_str())
.unwrap();
Ok(route_name.to_string())
}
}
Complete Multi-Provider Example
Here’s a complete example using multiple providers with semantic routing:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let openai_client = openai::Client::from_env();
let coding_agent = openai_client
.agent("gpt-5")
.preamble("You are an expert coding assistant specializing in Rust programming.")
.build();
let math_agent = openai_client
.agent("gpt-5")
.preamble("You are a mathematics expert who excels at solving complex problems.")
.build();
let rtr = TypedRouter::new()
.add_route("rust", coding_agent)
.add_route("maths", math_agent);
let semantic_router = create_semantic_router(&openai_client).await?;
let prompt = "How do I use async with Rust?";
println!("Prompt: {prompt}");
let route_name = semantic_route_query(prompt, &semantic_router, &openai_client).await?;
println!("Route name selected: {route_name}");
let response = rtr
.fetch_agent(&route_name)
.unwrap()
.prompt(prompt)
.await
.unwrap();
println!("Response: {response}");
Ok(())
}
Key Concepts
- Route Definition: Each route maps to a specialized agent with specific expertise
- Multi-Provider Support: Mix OpenAI, Anthropic, and other providers in the same router
- Type Erasure vs Explicit Types: Choose between trait objects for flexibility or enums for type safety
- Semantic Matching: Use embeddings to intelligently match queries to the most appropriate route
- Fallback Strategy: Always provide a default route for queries that don’t match any specific pattern
- Agent Specialization: Configure each agent’s preamble to optimize for its routing domain
- Provider Selection: Route to the best provider for each task (e.g., OpenAI for code, Anthropic for analysis)
This pattern allows you to build scalable, intelligent routing systems that direct queries to the most appropriate specialized agent from any provider automatically.
Dynamic model creation
This section will talk about some of the challenges around creating model provider clients dynamically and how we can make this as convenient as possible.
The full example for this section can be found in the GitHub repo.
Due to the Rust type system, dynamic model client creation is made a bit more difficult than in more dynamic programming languages like Python due to having to specify a type for everything. However, that doesn’t mean it is made impossible: it simply has more tradeoffs for doing so.
Let’s have a look at some of our possible options for achieving such a feat.
Enum dispatch
The simplest (and easiest!) way to set up dynamic model client creation is typically to have a single enum under which all the clients you want to go will use:
#![allow(unused)]
fn main() {
enum Agents {
OpenAI(rig::client::Client<OpenAIResponsesExt, reqwest::Client>),
Anthropic(rig::client::Client<AnthropicExt, reqwest::Client>),
}
impl DynamicClient {
fn openai() -> Self {
let client = rig::providers::openai::Client::from_env();
Self::OpenAI(client)
}
fn anthropic() -> Self {
let client = rig::providers::openai::Client::from_env();
Self::OpenAI(client)
}
async fn prompt(&self, prompt: &str) -> Result<String, PromptError> {
match self {
Self::Anthropic(agent) => agent.prompt(prompt).await,
Self::OpenAI(agent) => agent.prompt(prompt).await,
}
}
}
}
While this is probably the most convenient method of model creation, you may find that needing to match on every enum variant every single time you need to match the client is quite messy and painful - especially if you aim to support every single provider that Rig also does.
To make our dynamic enum much easier to use, we’ll create a provider registry that stores a hashmap of strings tied to function pointers that will create an instance of Agents. We can then create some functions as below that essentially allow dynamic creation of agents based on the inputted string:
#![allow(unused)]
fn main() {
struct AgentConfig<'a> {
name: &'a str,
preamble: &'a str,
}
struct ProviderRegistry(HashMap<&'static str, fn(&AgentConfig) -> Agents>);
impl ProviderRegistry {
pub fn new() -> Self {
Self(HashMap::from_iter([
("anthropic", anthropic_agent as fn(&AgentConfig) -> Agents),
("openai", openai_agent as fn(&AgentConfig) -> Agents),
]))
}
pub fn agent(&self, provider: &str, agent_config: &AgentConfig) -> Option<Agents> {
self.0.get(provider).map(|p| p(agent_config))
}
}
/// A function that creates an instance of `Agents` (using the Anthropic variant)
fn anthropic_agent(AgentConfig { name, preamble }: &AgentConfig) -> Agents {
let agent = anthropic::Client::from_env()
.agent(CLAUDE_3_7_SONNET)
.name(name)
.preamble(preamble)
.build();
Agents::Anthropic(agent)
}
/// A function that creates an instance of `Agents` (using the OpenAI variant)
fn openai_agent(AgentConfig { name, preamble }: &AgentConfig) -> Agents {
let agent = openai::Client::from_env()
.completions_api()
.agent(GPT_4O)
.name(name)
.preamble(preamble)
.build();
Agents::OpenAI(agent)
}
}
Once done, we can then use this in the example like below:
#[tokio::main]
async fn main() {
let registry = ProviderRegistry::new();
let prompt = "How much does 4oz of parmesan cheese weigh?";
println!("Prompt: {prompt}");
let helpful_cfg = AgentConfig {
name: "Assistant",
preamble: "You are a helpful assistant",
};
let openai_agent = registry.agent("openai", &helpful_cfg).unwrap();
let oai_response = openai_agent.prompt(prompt).await.unwrap();
println!("Helpful response (OpenAI): {oai_response}");
let unhelpful_cfg = AgentConfig {
name: "Assistant",
preamble: "You are an unhelpful assistant",
};
let anthropic_agent = registry.agent("anthropic", &unhelpful_cfg).unwrap();
let anthropic_response = anthropic_agent.prompt(prompt).await.unwrap();
println!("Unhelpful response (Anthropic): {anthropic_response}");
}
The tradeoffs of using dynamic model creation
Of course, when it comes to using dynamic model creation factory patterns like this, there are some non-trivial tradeoffs that need to be made:
- If not using
typemap, this abstraction creates a pocket of type unsafety which generally makes it more difficult to maintain if you want to extend it - Lack of concrete typing means you lose any and all model type specific methods
- You need to update the typemap every now and then when new models come out
- Performance hit at runtime (although in this particular case, the performance hit should generally be quite minimal)
However, in some cases if you are running a service that (for example) provides multiple options to users for different providers, this may be a preferable alternative to enum dispatch.
Multi-agent systems
The full example for this section can be found on the GitHub repo.
The case for multi-agent systems
As your LLM workflows grow more sophisticated, you’ll eventually hit a wall with single-agent architectures. An agent that starts by handling customer support tickets might gradually take on order processing, then inventory checks, then fraud detection. Before long, you’re dealing with an agent that has access to 30+ tools, a bloated system prompt trying to juggle multiple responsibilities, and increasingly unreliable outputs.
Like when you hire a new employee and give them too many responsibilities, this can confuse an agent and cause the output quality to degrade - sometimes significantly. This can be a major failure mode in production: often, you may be making use of lots of different tools from MCP servers as well as internal tools, plus RAG context and memories - all of which can bloat your context window. This can cause your agent to do things such as attempt to call tools incorrectly or hallucinate.
The solution to this is to create multiple agents that specialise in one subject each, then either use a “manager” agent that manages their execution or have them co-ordinate between each other. A customer support agent might know how to talk to customers but needs help from a fraud detection agent. A research co-ordinator might delegate specific enquiries to domain experts.
Do I need multi-agent systems?
If your workflow operates in one domain with a focused set of tools (under 10-15), better prompting and context engineering will almost always beat the complexity of needing to manage multiple agents.
Try these first:
- Structured outputs for reducing ambiguity
- Better retrieval (improved chunking, more relevant/complete datasets)
- Tighter constraints, clearer role definitions
The following requirements benefit the most from multi-agent systems:
- Agents that need 20+ tools and start calling wrong/irrelevant tools
- Tasks that require cross-domain co-ordination (eg, documentation writing + product building)
- Context window exhaustion - you’ve optimised retrieval but still can’t fit in all the necessary context
- Clear role delegation boundaries
Without clear boundaries/requirements however, it’s better to use one agent with improved context engineering. If you can’t clearly articulate why you need a multi-agent system, it is often better to keep it simple.
If you need to measure the metrics for your system to help improve it, you may want to look at the observability section of this playbook.
Manager-worker pattern
Rig supports manager-worker patterns for multi-agent systems out of the box by allowing agents to be added to other agents as tools, making it easy to form a manager-worker architecture pattern. Below is a diagram that illustrates this concept:
graph TD
A[User Request] --> B[Manager Agent]
B --> C{Task Planning}
C --> D[Decompose into Subtasks]
D --> E[Worker 1<br/>Subtask A]
D --> F[Worker 2<br/>Subtask B]
D --> G[Worker 3<br/>Subtask C]
E --> H[Result A]
F --> I[Result B]
G --> J[Result C]
H --> K[Manager Agent<br/>Result Aggregation]
I --> K
J --> K
K --> L[Synthesize & Validate]
L --> M[Final Response]
style B fill:#ff9999,color:#000
style E fill:#99ccff,color:#000
style F fill:#99ccff,color:#000
style G fill:#99ccff,color:#000
style K fill:#ff9999,color:#000
In Rig, we can easily illustrate this with writing two agents called Alice and Bob. Alice is a manager at FooBar Inc., a fictitious company made up for the purposes of this example. Bob is employed by Alice to do some work. We can see how this hierarchy is pretty simple:
#![allow(unused)]
fn main() {
use rig::{
client::{CompletionClient, ProviderClient},
completion::Prompt,
};
let prompt = "Ask Bob to write an email for you and let me know what he has written.";
let bob = openai_client.agent("gpt-5")
.name("Bob")
.description("An employee who works in admin at FooBar Inc.")
.preamble("You are Bob, an employee working in admin at FooBar Inc. Alice, your manager, may ask you to do things. You need to do them.")
.build();
let alice = openai_client.agent("gpt-5")
.name("Alice")
.description("A manager at FooBar Inc.")
.preamble("You are a manager in the admin department at FooBar Inc. You manage Bob.")
.tool(bob)
.build();
let res = alice.prompt("Ask Bob to write an email for you and let me know what he has written.").await?;
println!("{res:?}");
}
Under the hood, OpenAI will initially return a tool call to prompt Bob (with a prompt provided by the LLM). Rig executes this by prompting Bob with the given prompt. Bob will then return a response back to Alice, then Alice will return the response back to us using the information provided by Bob.
If you do it this way, make sure you give your agents a name and description as both of these are used in the tool implementation. This can be done by using AgentBuilder::name() and AgentBuilder::description() respectively in the agent builder.
Distributed agent architecture (swarm behaviour)
Although swarm-style agent architecture is not supported out of the box with Rig, we can easily re-create it by using the actor pattern.
Generally there are a lot of ways you can do this, but typically a participant (ie an agent) in any given swarm may have the following:
- Some kind of external trigger
- A way to process messages from other swarm participants
Let’s get into some code snippets for creating agents that use an actor pattern to be able to simulate swarm behaviour. First, let’s define some types for possible message and the agent itself, plus some methods for talking to other connected peers:
#![allow(unused)]
fn main() {
use rig::providers::openai;
use rig::completion::Prompt;
use tokio::sync::mpsc;
use std::sync::Arc;
use tokio::sync::RwLock;
/// Message types for inter-agent communication
#[derive(Debug, Clone)]
enum AgentMessage {
Task(String),
Response(String, String), // (from_agent_id, content)
Trigger(String),
Shutdown,
}
/// Agent state
struct AgentState {
id: String,
task_queue: Vec<String>,
conversation_history: Vec<String>,
}
/// Actor-based autonomous agent
struct AutonomousAgent {
id: String,
client: openai::Client,
state: Arc<RwLock<AgentState>>,
inbox: mpsc::Receiver<AgentMessage>,
outbox: mpsc::Sender<AgentMessage>,
peer_channels: Arc<RwLock<Vec<mpsc::Sender<AgentMessage>>>>,
}
impl AutonomousAgent {
fn new(
id: String,
api_key: String,
inbox: mpsc::Receiver<AgentMessage>,
outbox: mpsc::Sender<AgentMessage>,
) -> Self {
let client = openai::Client::new(&api_key);
let state = Arc::new(RwLock::new(AgentState {
id: id.clone(),
task_queue: Vec::new(),
conversation_history: Vec::new(),
}));
Self {
id,
client,
state,
inbox,
outbox,
peer_channels: Arc::new(RwLock::new(Vec::new())),
}
}
/// Register peer agents for communication
async fn register_peer(&self, peer_channel: mpsc::Sender<AgentMessage>) {
let mut peers = self.peer_channels.write().await;
peers.push(peer_channel);
}
/// Send message to all peer agents
async fn broadcast_to_peers(&self, message: AgentMessage) {
let peers = self.peer_channels.read().await;
for peer in peers.iter() {
let _ = peer.send(message.clone()).await;
}
}
/// Process autonomous task using LLM
/// This currently shows a simple LLM prompt, but if you wanted you could give your agent some tools!
async fn process_autonomous_task(&self, task: &str) -> Result<String, rig::completion::PromptError> {
let agent = self.client
.agent("gpt-5")
.preamble(&format!(
"Your name is {}. Process tasks autonomously and coordinate with other agents.",
self.id
))
.build();
let response = agent.prompt(task).await?;
Ok(response)
}
}
}
Next is the final piece we need to make our agent work: handling messages and the actual run loop.
Since we want to be able to both run tasks from both an autonomous trigger as well as received messages, we use the tokio::select! macro. This macro basically works by executing the first future that finishes. In this case, we’ll use a tokio::interval timer to essentially create a control flow that can either be executed from the timer successfully ticking, or the agent receiving a message (whichever one comes first!).
#![allow(unused)]
fn main() {
use tokio::time::{interval, Duration};
impl AutonomousAgent {
async fn handle_message(&self, task: AgentMessage) {
o match task {
AgentMessage::Task(task) => {
println!("[{}] Received task: {}", self.id, task);
match self.process_autonomous_task(&task).await {
Ok(result) => {
println!("[{}] Completed task: {}", self.id, result);
// Store in history
let mut state = self.state.write().await;
state.conversation_history.push(format!("Task: {} | Result: {}", task, result));
// Broadcast result to peers
self.broadcast_to_peers(
AgentMessage::Response(self.id.clone(), result)
).await;
}
Err(e) => eprintln!("[{}] Error processing task: {}", self.id, e),
}
}
AgentMessage::Response(from_id, content) => {
println!("[{}] Received response from {}: {}", self.id, from_id, content);
let mut state = self.state.write().await;
state.conversation_history.push(format!("From {}: {}", from_id, content));
}
AgentMessage::Trigger(trigger_msg) => {
println!("[{}] External trigger: {}", self.id, trigger_msg);
// Process trigger autonomously
let _ = self.process_autonomous_task(&trigger_msg).await;
}
message => {
println!("Unsupported message variant received: {message:?}");
// this could theoretically return an error or panic
// this should never return the shutdown enum variant because enums are eagerly evaluated
}
}
}
// Main actor loop
async fn run(mut self) {
println!("Agent '{}' started and running autonomously", self.id);
// External trigger: periodic self-check (runs every 10 seconds)
let mut tick_interval = interval(Duration::from_secs(10));
loop {
tokio::select! {
// Handle incoming messages from other agents
Some(msg) = self.inbox.recv() => {
match msg {
AgentMessage::Shutdown => {
println!("Shutting down...");
break
}
_ => {
self.handle_message(msg).await;
}
}
}
// Autonomous periodic task (external trigger)
_ = tick_interval.tick() => {
println!("[{}] Autonomous tick - checking for self-initiated tasks", self.id);
// Check if agent should create its own task
// Use scoped brackets here to avoid needing to manually drop lock
let needs_to_create_own_task = {
let state_rlock = self.state.read().await;
state_rlock.task_queue.is_empty() && !state_rlock.conversation_history.is_empty()
};
if needs_to_create_own_task {
let summary_task = "Summarize what you've accomplished so far in one sentence.";
match self.process_autonomous_task(summary_task).await {
Ok(summary) => {
println!("[{}] Self-initiated summary: {}", self.id, summary);
}
Err(e) => eprintln!("[{}] Error in autonomous task: {}", self.id, e),
}
}
}
}
}
}
}
}
Below is a code snippet to showcase this:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let api_key = std::env::var("OPENAI_API_KEY")
.expect("OPENAI_API_KEY must be set");
// Create channels for agent communication
let (tx1, rx1) = mpsc::channel(100);
let (tx2, rx2) = mpsc::channel(100);
let (tx3, rx3) = mpsc::channel(100);
// Create agents
let agent1 = AutonomousAgent::new("Tom".to_string(), api_key.clone(), rx1, tx1.clone());
let agent2 = AutonomousAgent::new("Richard".to_string(), api_key.clone(), rx2, tx2.clone());
let agent3 = AutonomousAgent::new("Harry".to_string(), api_key, rx3, tx3.clone());
// Register peers (each agent knows about the others)
agent1.register_peer(tx2.clone()).await;
agent1.register_peer(tx3.clone()).await;
agent2.register_peer(tx1.clone()).await;
agent2.register_peer(tx3.clone()).await;
agent3.register_peer(tx1.clone()).await;
agent3.register_peer(tx2.clone()).await;
// Spawn agent actors
let handle1 = tokio::spawn(agent1.run());
let handle2 = tokio::spawn(agent2.run());
let handle3 = tokio::spawn(agent3.run());
// Send initial task to Agent-Alpha
tx1.send(AgentMessage::Task("Analyze the benefits of autonomous agent systems".to_string())).await?;
// External trigger example
tokio::time::sleep(Duration::from_secs(5)).await;
tx2.send(AgentMessage::Trigger("Check system status and report findings".to_string())).await?;
// Let agents run for demonstration
tokio::time::sleep(Duration::from_secs(30)).await;
// Shutdown
tx1.send(AgentMessage::Shutdown).await?;
tx2.send(AgentMessage::Shutdown).await?;
tx3.send(AgentMessage::Shutdown).await?;
handle1.await?;
handle2.await?;
handle3.await?;
Ok(())
}
If you are interested in taking AI agent swarms to production using actor patterns, you may want to look into ractor. Ractor is currently the most popular Rust actor framework, is being used actively at companies like Kraken (the crypto exchange) and Whatsapp and additionally has cluster support.
Observability
Effective observability is crucial for understanding and debugging LLM applications in production. Rig provides built-in instrumentation using the tracing ecosystem, allowing you to track requests, measure performance, and diagnose issues across your AI workflows.
The complete codebase for this section is split up into two separate binary examples, with a separate folder for the accompanying OpenTelemetry files:
Observability in LLM-assisted systems
Observability in AI/LLM-assisted systems is a huge component in being able to ensure that a non-deterministic system can still be considered relatively reliable by making sure that metrics like model drift (and accuracy!), guardrail effectiveness and token usage can be easily tracked as well as error rates. Even moreso than traditional systems, it’s important for LLM-assisted systems to be instrumented and observable specifically because of the non-deterministic component.
Overview
Rig’s observability approach is relatively minimal and unopinionated. Internally we use the tracing crate to emit logs and spans, which you can use however you want and can use any kind of tracing subscriber (via tracing-subscriber) or log facade (like env-logger), etc, to emit them.
Instrumentation Levels
Rig uses the following logging conventions:
INFOlevel: Spans marking the start and end of operationsTRACElevel: Detailed request/response message logs for debugging
Basic Setup with tracing-subscriber
The simplest way to get started is with tracing-subscriber’s formatting layer:
#![allow(unused)]
fn main() {
tracing_subscriber::fmt().init();
}
However this requires you to add your RUSTLOG environment variable every time you want to see the logs. For example:
RUSTLOG=trace cargo run
Useful if you want to get logs once, not so useful if you are running quick iterative loops. Let’s combine it with an EnvFilter, which will let us automatically set the logging level (in this case we’ll use info level for general logging and trace level logging specifically for Rig):
#![allow(unused)]
fn main() {
tracing_subscriber::registry()
.with(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| "info,rig=trace".into())
)
.with(tracing_subscriber::fmt::layer())
.init();
}
The next step is actually adding the logging and instrumentation. By using the #[tracing::instrument] macro, we can automatically enter a span (a monitored unit of work) and output some events, which will be added as logs under the span.
use rig::{
client::{CompletionClient, ProviderClient},
completion::Prompt,
providers::openai,
};
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
use tracing::{info, instrument};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
tracing_subscriber::registry()
.with(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| "info,rig=trace".into())
)
.with(tracing_subscriber::fmt::layer())
.init();
let response = process_query("Hello world!").await.unwrap();
println!("Response: {response}");
Ok(())
}
#[instrument(name = "process_user_query")]
async fn process_query(user_input: &str) -> Result<String, Box<dyn std::error::Error>> {
info!("Processing user query");
let openai_client = openai::Client::from_env();
let agent = openai_client
.agent("gpt-4")
.preamble("You are a helpful assistant.")
.build();
// This completion call will emit spans automatically
let response = agent.prompt(user_input).await?;
info!("Query processed successfully");
Ok(response)
}
When running this example, you’ll see:
INFOspans for the completion operation lifecycleTRACElogs showing the actual request/response payloads- Custom spans from your application code (like
process_user_query)
Customizing Output Format
For JSON-formatted logs (useful in production), tracing provides a JSON formatting layer that will automatically output logs in JSON format. See below for a practical usage example:
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, fmt};
fn main() {
tracing_subscriber::registry()
.with(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| "info,rig=trace".into())
)
.with(fmt::layer().json())
.init();
// Your application code
}
This is primarily useful when you have a local log drain or place where logs (or traces!) are stored for debugging, as it allows you to then query the logs easily using jq:
tail -f <logfile> | jq .
OpenTelemetry Integration
For production deployments, you’ll typically want to export traces to an external observability platform. Rig integrates seamlessly with OpenTelemetry, allowing you to use your own Otel collector and route traces to any compatible backend.
This example will utilise the OTel collector as it can easily be used to send your traces/spans and logs anywhere you’d like.
Dependencies
Add the following dependencies to your Cargo.toml - the listed dependencies below (besides rig-core) are all required to make this work:
[dependencies]
rig-core = "0.27.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
tracing-opentelemetry = "0.30"
opentelemetry = { version = "0.31", features = ["trace"] }
opentelemetry_sdk = { version = "0.31", features = ["rt-tokio"] }
opentelemetry-otlp = { version = "0.31", features = ["tonic", "trace"] }
Exporting to an OTEL Collector
Configure Rig to export traces to your OpenTelemetry collector:
use opentelemetry::trace::TracerProvider;
use opentelemetry_sdk::{runtime, trace as sdktrace, Resource};
use opentelemetry_otlp::WithExportConfig;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let exporter = opentelemetry_otlp::SpanExporter::builder()
.with_http()
.with_protocol(opentelemetry_otlp::Protocol::HttpBinary)
.build()?;
// Create a new OpenTelemetry trace pipeline that prints to stdout
let provider = SdkTracerProvider::builder()
.with_batch_exporter(exporter)
.with_resource(Resource::builder().with_service_name("rig-service").build())
.build();
let tracer = provider.tracer("example");
// Create a tracing layer with the configured tracer
let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
let filter_layer = tracing_subscriber::filter::EnvFilter::builder()
.with_default_directive(Level::INFO.into())
.from_env_lossy();
// add a `fmt` layer that prettifies the logs/spans that get outputted to `stdout`
let fmt_layer = tracing_subscriber::fmt::layer().pretty();
// Create a multi-layer subscriber that filters for given traces,
// prettifies the logs/spans and then sends them to the OTel collector.
tracing_subscriber::registry()
.with(filter_layer)
.with(fmt_layer)
.with(otel_layer)
.init();
let response = process_query("Hello world!").await.unwrap();
println!("Response: {response}");
// Shutdown tracer provider on exit
opentelemetry::global::shutdown_tracer_provider();
Ok(())
}
This should output all traces to your program’s stdout, as well as sending it to your OTel collector which will then transform the spans as required and send them along to Langfuse.
OTEL Collector Configuration
Your OTEL collector can then route traces to various backends. Below is an example YAML file which you might use - for the purposes of this example we’ll be using Langfuse as they are a relatively well known service provider for LLM-related observability that can additionally be self-hosted:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
processors:
transform:
trace_statements:
- context: span
statements:
# Rename span if it's "invoke_agent" and has an agent attribute
# Theoretically this can be left out,
- set(name, attributes["gen_ai.agent.name"]) where name == "invoke_agent" and attributes["gen_ai.agent.name"] != nil
exporters:
debug:
verbosity: detailed
otlphttp/langfuse:
endpoint: "https://cloud.langfuse.com/api/public/otel"
headers:
# Langfuse uses basic auth, in the form of username:password.
# In this case your username is your Langfuse public key and the password is your Langfuse secret key.
Authorization: "Basic ${AUTH_STRING}"
service:
pipelines:
traces:
receivers: [otlp]
processors: [transform]
exporters: [otlphttp/langfuse, debug]
To actually use this file, you would want to write your own Dockerfile that pulls the OTel collector image:
# Start from the official OpenTelemetry Collector Contrib image
FROM otel/opentelemetry-collector-contrib:0.135.0
# Copy your local config into the container
# Replace `config.yaml` with your actual filename if different
COPY ./config.yaml /etc/otelcol-contrib/config.yaml
You can then build this image with docker build -t <some-tag-name> -f <dockerfile-filename> . where <some-tag-name> is whatever name you want to give to the image.
Integration with Specific Providers
While Rig doesn’t include pre-built subscriber layers for specific vendors, the otel collector approach allows you to send traces to any observability platform. This is arguably a much more flexible way to set up telemetry, although it does have some set-up required.
If you’re interested in tracing subscriber layer integrations for specific integrations, please open a feature request issue on our GitHub repo!