Add embed endpoint (POST /api/embed)
ollama-rs
An async Rust client library for the Ollama API. Provides a streaming-first interface for text generation, multi-turn chat, model management, and advanced features like structured output and tool calling.
Features
- Fully async with tokio and streaming responses via
futures::Stream - Text generation and multi-turn chat conversations
- Structured JSON output with schema validation
- Tool calling / function calling support
- Model management (list, pull, delete, inspect running models)
- Text embeddings generation
- Builder pattern for constructing requests
- Configurable generation parameters (temperature, top-k, top-p, and more)
- Thinking / reasoning mode support
Installation
Add ollama-rs to your Cargo.toml:
[dependencies]
ollama-rs = { git = "https://github.com/andreban/ollama-rs.git" }
tokio = { version = "1", features = ["full"] }
futures-util = "0.3"
Prerequisites
A running Ollama server. By default, Ollama listens on http://localhost:11434.
Quick Start
Text Generation
use std::io::Write;
use futures_util::StreamExt;
use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
#[tokio::main]
async fn main() {
let client = OllamaClient::new("http://localhost:11434");
let request = GenerateRequest::builder("llama3:8b")
.prompt("Why is the sky blue?")
.build();
let mut stream = client.generate(request);
while let Some(response) = stream.next().await {
match response {
Ok(token) => {
print!("{}", token.response);
std::io::stdout().flush().unwrap();
if token.done {
break;
}
}
Err(e) => eprintln!("Error: {}", e),
}
}
}
Chat
use std::io::Write;
use futures_util::StreamExt;
use ollama_rs::{OllamaClient, types::chat::{ChatRequest, Message}};
#[tokio::main]
async fn main() {
let client = OllamaClient::new("http://localhost:11434");
let messages = vec![
Message::system("You are a helpful assistant."),
Message::user("What is the capital of France?"),
];
let request = ChatRequest::builder("llama3:8b")
.messages(messages)
.build();
let mut stream = client.chat(request);
while let Some(response) = stream.next().await {
let response = response.unwrap();
print!("{}", response.message.content);
std::io::stdout().flush().unwrap();
if response.done {
break;
}
}
}
Structured Output
Force the model to respond with JSON matching a specific schema:
use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
use serde_json::json;
let schema = json!({
"type": "object",
"properties": {
"answer": { "type": "string" },
"confidence": { "type": "number" }
}
});
let request = GenerateRequest::builder("llama3:8b")
.prompt("What is 2 + 2?")
.stream(false)
.format(schema)
.build();
Tool Calling
Define tools the model can invoke during a chat conversation:
use ollama_rs::types::chat::{ChatRequest, Function, Message, Tool, ToolType};
use serde_json::json;
let tools = vec![Tool {
tool_type: ToolType::Function,
function: Function {
name: "get_weather".to_string(),
description: "Get the current weather for a city.".to_string(),
parameters: json!({
"type": "object",
"properties": {
"city": { "type": "string", "description": "The name of the city" }
},
"required": ["city"]
}),
},
}];
let request = ChatRequest::builder("llama3:8b")
.messages(vec![Message::user("What is the weather in Paris?")])
.stream(false)
.tools(tools)
.build();
When the model decides to call a tool, the response message.tool_calls field will contain the tool name and arguments. You can then execute the function and send the result back via Message::tool_response(...) which returns an OllamaResult<Message>.
API Reference
OllamaClient
| Method | Description |
|---|---|
new(server_address) |
Create a new client with a 30-second connection timeout |
default() |
Create a client connecting to http://localhost:11434 |
builder(server_address) |
Create a client with custom configuration (see below) |
version() |
Get the Ollama server version |
tags() |
List all available models |
ps() |
List currently running/loaded models |
generate(request) |
Generate text (streaming) |
chat(request) |
Chat conversation (streaming) |
pull(request) |
Pull/download a model (streaming) |
delete(request) |
Delete a model from the server |
embed(request) |
Generate vector embeddings |
OllamaClient::builder(server_address) -- .connection_timeout(Duration), .build()
use std::time::Duration;
use ollama_rs::OllamaClient;
let client = OllamaClient::builder("http://localhost:11434")
.connection_timeout(Duration::from_secs(60))
.build();
Request Builders
GenerateRequest::builder(model) -- .prompt(), .system_prompt(), .format(), .options(), .stream(), .think(), .images(), .suffix()
ChatRequest::builder(model) -- .messages(), .tools(), .format(), .options(), .stream(), .think()
PullRequest::builder(model) -- .stream()
EmbedRequest::builder(model) -- .input(), .inputs(), .truncate(), .dimensions(), .keep_alive(), .options()
Generation Options
Configure sampling parameters via Options::builder():
| Option | Description |
|---|---|
temperature(f32) |
Controls randomness (0.0 - 2.0) |
top_k(u32) |
Top-K sampling |
top_p(f32) |
Nucleus sampling threshold |
min_p(f32) |
Minimum probability filter |
seed(u64) |
Random seed for reproducibility |
num_ctx(u32) |
Context window size |
num_predict(u32) |
Maximum tokens to generate |
stop(Stop) |
Stop sequences |
Examples
The examples/ directory contains runnable programs:
| Example | Description |
|---|---|
generate |
Basic text generation |
chat |
Interactive multi-turn chat |
structured_output |
JSON structured output with schema |
tool_call |
Function calling / tool use |
pull |
Download a model |
delete |
Delete a model |
embed |
Generate text embeddings |
tags |
List available models |
ps |
List running models |
version |
Query server version |
Run an example:
OLLAMA_SERVER=http://localhost:11434 cargo run --example chat
Configuration
| Environment Variable | Description |
|---|---|
OLLAMA_SERVER |
Ollama server address (e.g., http://localhost:11434) |
RUST_LOG |
Log level filter (e.g., ollama_rs=debug) |