# ollama-rs An async Rust client library for the [Ollama](https://ollama.com/) API. Provides a streaming-first interface for text generation, multi-turn chat, model management, and advanced features like structured output and tool calling. ## Features - Fully async with [tokio](https://tokio.rs/) and streaming responses via `futures::Stream` - Text generation and multi-turn chat conversations - Structured JSON output with schema validation - Tool calling / function calling support - Model management (list, pull, inspect running models) - Builder pattern for constructing requests - Configurable generation parameters (temperature, top-k, top-p, and more) - Thinking / reasoning mode support ## Installation Add `ollama-rs` to your `Cargo.toml`: ```toml [dependencies] ollama-rs = { git = "https://github.com/andreban/ollama-rs.git" } tokio = { version = "1", features = ["full"] } futures-util = "0.3" ``` ## Prerequisites A running [Ollama](https://ollama.com/) server. By default, Ollama listens on `http://localhost:11434`. ## Quick Start ### Text Generation ```rust use std::io::Write; use futures_util::StreamExt; use ollama_rs::{OllamaClient, types::generate::GenerateRequest}; #[tokio::main] async fn main() { let client = OllamaClient::new("http://localhost:11434"); let request = GenerateRequest::builder("llama3:8b") .prompt("Why is the sky blue?") .build(); let mut stream = client.generate(request); while let Some(response) = stream.next().await { match response { Ok(token) => { print!("{}", token.response); std::io::stdout().flush().unwrap(); if token.done { break; } } Err(e) => eprintln!("Error: {}", e), } } } ``` ### Chat ```rust use std::io::Write; use futures_util::StreamExt; use ollama_rs::{OllamaClient, types::chat::{ChatRequest, Message}}; #[tokio::main] async fn main() { let client = OllamaClient::new("http://localhost:11434"); let messages = vec![ Message::system("You are a helpful assistant."), Message::user("What is the capital of France?"), ]; let request = ChatRequest::builder("llama3:8b") .messages(messages) .build(); let mut stream = client.chat(request); while let Some(response) = stream.next().await { let response = response.unwrap(); print!("{}", response.message.content); std::io::stdout().flush().unwrap(); if response.done { break; } } } ``` ### Structured Output Force the model to respond with JSON matching a specific schema: ```rust use ollama_rs::{OllamaClient, types::generate::GenerateRequest}; use serde_json::json; let schema = json!({ "type": "object", "properties": { "answer": { "type": "string" }, "confidence": { "type": "number" } } }); let request = GenerateRequest::builder("llama3:8b") .prompt("What is 2 + 2?") .stream(false) .format(schema) .build(); ``` ### Tool Calling Define tools the model can invoke during a chat conversation: ```rust use ollama_rs::types::chat::{ChatRequest, Function, Message, Tool, ToolType}; use serde_json::json; let tools = vec![Tool { tool_type: ToolType::Function, function: Function { name: "get_weather".to_string(), description: "Get the current weather for a city.".to_string(), parameters: json!({ "type": "object", "properties": { "city": { "type": "string", "description": "The name of the city" } }, "required": ["city"] }), }, }]; let request = ChatRequest::builder("llama3:8b") .messages(vec![Message::user("What is the weather in Paris?")]) .stream(false) .tools(tools) .build(); ``` When the model decides to call a tool, the response `message.tool_calls` field will contain the tool name and arguments. You can then execute the function and send the result back via `Message::tool_response(...)` which returns an `OllamaResult`. ## API Reference ### `OllamaClient` | Method | Description | |--------|-------------| | `new(server_address)` | Create a new client with a 30-second connection timeout | | `default()` | Create a client connecting to `http://localhost:11434` | | `builder(server_address)` | Create a client with custom configuration (see below) | | `version()` | Get the Ollama server version | | `tags()` | List all available models | | `ps()` | List currently running/loaded models | | `generate(request)` | Generate text (streaming) | | `chat(request)` | Chat conversation (streaming) | | `pull(request)` | Pull/download a model (streaming) | **`OllamaClient::builder(server_address)`** -- `.connection_timeout(Duration)`, `.build()` ```rust use std::time::Duration; use ollama_rs::OllamaClient; let client = OllamaClient::builder("http://localhost:11434") .connection_timeout(Duration::from_secs(60)) .build(); ``` ### Request Builders **`GenerateRequest::builder(model)`** -- `.prompt()`, `.system_prompt()`, `.format()`, `.options()`, `.stream()`, `.think()`, `.images()`, `.suffix()` **`ChatRequest::builder(model)`** -- `.messages()`, `.tools()`, `.format()`, `.options()`, `.stream()`, `.think()` **`PullRequest::builder(model)`** -- `.stream()` ### Generation Options Configure sampling parameters via `Options::builder()`: | Option | Description | |--------|-------------| | `temperature(f32)` | Controls randomness (0.0 - 2.0) | | `top_k(u32)` | Top-K sampling | | `top_p(f32)` | Nucleus sampling threshold | | `min_p(f32)` | Minimum probability filter | | `seed(u64)` | Random seed for reproducibility | | `num_ctx(u32)` | Context window size | | `num_predict(u32)` | Maximum tokens to generate | | `stop(Stop)` | Stop sequences | ## Examples The `examples/` directory contains runnable programs: | Example | Description | |---------|-------------| | `generate` | Basic text generation | | `chat` | Interactive multi-turn chat | | `structured_output` | JSON structured output with schema | | `tool_call` | Function calling / tool use | | `pull` | Download a model | | `tags` | List available models | | `ps` | List running models | | `version` | Query server version | Run an example: ```sh OLLAMA_SERVER=http://localhost:11434 cargo run --example chat ``` ## Configuration | Environment Variable | Description | |----------------------|-------------| | `OLLAMA_SERVER` | Ollama server address (e.g., `http://localhost:11434`) | | `RUST_LOG` | Log level filter (e.g., `ollama_rs=debug`) |