The Display trait should not append trailing newlines, as callers like println! and error chaining libraries (anyhow, eyre) add their own. Also cleaned up the display strings to use readable names instead of raw variant names.
ollama-rs
An async Rust client library for the Ollama API. Provides a streaming-first interface for text generation, multi-turn chat, model management, and advanced features like structured output and tool calling.
Features
- Fully async with tokio and streaming responses via
futures::Stream - Text generation and multi-turn chat conversations
- Structured JSON output with schema validation
- Tool calling / function calling support
- Model management (list, pull, inspect running models)
- Builder pattern for constructing requests
- Configurable generation parameters (temperature, top-k, top-p, and more)
- Thinking / reasoning mode support
Installation
Add ollama-rs to your Cargo.toml:
[dependencies]
ollama-rs = { git = "https://github.com/andreban/ollama-rs.git" }
tokio = { version = "1", features = ["full"] }
futures-util = "0.3"
Prerequisites
A running Ollama server. By default, Ollama listens on http://localhost:11434.
Quick Start
Text Generation
use std::io::Write;
use futures_util::StreamExt;
use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
#[tokio::main]
async fn main() {
let client = OllamaClient::new("http://localhost:11434");
let request = GenerateRequest::builder("llama3:8b")
.prompt("Why is the sky blue?")
.build();
let mut stream = client.generate(request);
while let Some(response) = stream.next().await {
match response {
Ok(token) => {
print!("{}", token.response);
std::io::stdout().flush().unwrap();
if token.done {
break;
}
}
Err(e) => eprintln!("Error: {}", e),
}
}
}
Chat
use std::io::Write;
use futures_util::StreamExt;
use ollama_rs::{OllamaClient, types::chat::{ChatRequest, Message}};
#[tokio::main]
async fn main() {
let client = OllamaClient::new("http://localhost:11434");
let messages = vec![
Message::system("You are a helpful assistant."),
Message::user("What is the capital of France?"),
];
let request = ChatRequest::builder("llama3:8b")
.messages(messages)
.build();
let mut stream = client.chat(request);
while let Some(response) = stream.next().await {
let response = response.unwrap();
print!("{}", response.message.content);
std::io::stdout().flush().unwrap();
if response.done {
break;
}
}
}
Structured Output
Force the model to respond with JSON matching a specific schema:
use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
use serde_json::json;
let schema = json!({
"type": "object",
"properties": {
"answer": { "type": "string" },
"confidence": { "type": "number" }
}
});
let request = GenerateRequest::builder("llama3:8b")
.prompt("What is 2 + 2?")
.stream(false)
.format(schema)
.build();
Tool Calling
Define tools the model can invoke during a chat conversation:
use ollama_rs::types::chat::{ChatRequest, Function, Message, Tool, ToolType};
use serde_json::json;
let tools = vec![Tool {
tool_type: ToolType::Function,
function: Function {
name: "get_weather".to_string(),
description: "Get the current weather for a city.".to_string(),
parameters: json!({
"type": "object",
"properties": {
"city": { "type": "string", "description": "The name of the city" }
},
"required": ["city"]
}),
},
}];
let request = ChatRequest::builder("llama3:8b")
.messages(vec![Message::user("What is the weather in Paris?")])
.stream(false)
.tools(tools)
.build();
When the model decides to call a tool, the response message.tool_calls field will contain the tool name and arguments. You can then execute the function and send the result back via Message::tool_response(...).
API Reference
OllamaClient
| Method | Description |
|---|---|
new(server_address) |
Create a new client pointing at an Ollama server |
version() |
Get the Ollama server version |
tags() |
List all available models |
ps() |
List currently running/loaded models |
generate(request) |
Generate text (streaming) |
chat(request) |
Chat conversation (streaming) |
pull(request) |
Pull/download a model (streaming) |
Request Builders
GenerateRequest::builder(model) -- .prompt(), .system_prompt(), .format(), .options(), .stream(), .think(), .images(), .suffix()
ChatRequest::builder(model) -- .messages(), .tools(), .format(), .options(), .stream()
PullRequest::builder(model) -- .stream()
Generation Options
Configure sampling parameters via Options::builder():
| Option | Description |
|---|---|
temperature(f32) |
Controls randomness (0.0 - 2.0) |
top_k(u32) |
Top-K sampling |
top_p(f32) |
Nucleus sampling threshold |
min_p(f32) |
Minimum probability filter |
seed(u64) |
Random seed for reproducibility |
num_ctx(u32) |
Context window size |
num_predict(u32) |
Maximum tokens to generate |
stop(Stop) |
Stop sequences |
Examples
The examples/ directory contains runnable programs:
| Example | Description |
|---|---|
generate |
Basic text generation |
chat |
Interactive multi-turn chat |
structured_output |
JSON structured output with schema |
tool_call |
Function calling / tool use |
pull |
Download a model |
tags |
List available models |
ps |
List running models |
version |
Query server version |
Run an example:
OLLAMA_SERVER=http://localhost:11434 cargo run --example chat
Configuration
| Environment Variable | Description |
|---|---|
OLLAMA_SERVER |
Ollama server address (e.g., http://localhost:11434) |
RUST_LOG |
Log level filter (e.g., ollama_rs=debug) |