Go to file

André Cipriani Bandarra 35a7fd13f6 Add think field to ChatRequest

The Ollama API supports the think parameter for both generate and
chat endpoints. ChatRequest was missing it while GenerateRequest
already had it. Added the field, builder method, and doc comment
to bring chat to parity with generate.

2026-01-30 19:41:41 +00:00

examples

Return OllamaResult from Message::tool_response instead of panicking

2026-01-30 19:38:28 +00:00

src

Add think field to ChatRequest

2026-01-30 19:41:41 +00:00

.gitignore

Adds tags and ps methods

2025-12-23 22:04:38 +00:00

Cargo.lock

Update request to 0.13

2026-01-30 19:16:15 +00:00

Cargo.toml

Update request to 0.13

2026-01-30 19:16:15 +00:00

README.md

Adds README.md

2026-01-30 19:13:00 +00:00

README.md

ollama-rs

An async Rust client library for the Ollama API. Provides a streaming-first interface for text generation, multi-turn chat, model management, and advanced features like structured output and tool calling.

Features

Fully async with tokio and streaming responses via futures::Stream
Text generation and multi-turn chat conversations
Structured JSON output with schema validation
Tool calling / function calling support
Model management (list, pull, inspect running models)
Builder pattern for constructing requests
Configurable generation parameters (temperature, top-k, top-p, and more)
Thinking / reasoning mode support

Installation

Add ollama-rs to your Cargo.toml:

[dependencies]
ollama-rs = { git = "https://github.com/andreban/ollama-rs.git" }
tokio = { version = "1", features = ["full"] }
futures-util = "0.3"

Prerequisites

A running Ollama server. By default, Ollama listens on http://localhost:11434.

Quick Start

Text Generation

use std::io::Write;
use futures_util::StreamExt;
use ollama_rs::{OllamaClient, types::generate::GenerateRequest};

#[tokio::main]
async fn main() {
    let client = OllamaClient::new("http://localhost:11434");
    let request = GenerateRequest::builder("llama3:8b")
        .prompt("Why is the sky blue?")
        .build();

    let mut stream = client.generate(request);
    while let Some(response) = stream.next().await {
        match response {
            Ok(token) => {
                print!("{}", token.response);
                std::io::stdout().flush().unwrap();
                if token.done {
                    break;
                }
            }
            Err(e) => eprintln!("Error: {}", e),
        }
    }
}

Chat

use std::io::Write;
use futures_util::StreamExt;
use ollama_rs::{OllamaClient, types::chat::{ChatRequest, Message}};

#[tokio::main]
async fn main() {
    let client = OllamaClient::new("http://localhost:11434");
    let messages = vec![
        Message::system("You are a helpful assistant."),
        Message::user("What is the capital of France?"),
    ];
    let request = ChatRequest::builder("llama3:8b")
        .messages(messages)
        .build();

    let mut stream = client.chat(request);
    while let Some(response) = stream.next().await {
        let response = response.unwrap();
        print!("{}", response.message.content);
        std::io::stdout().flush().unwrap();
        if response.done {
            break;
        }
    }
}

Structured Output

Force the model to respond with JSON matching a specific schema:

use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
use serde_json::json;

let schema = json!({
    "type": "object",
    "properties": {
        "answer": { "type": "string" },
        "confidence": { "type": "number" }
    }
});

let request = GenerateRequest::builder("llama3:8b")
    .prompt("What is 2 + 2?")
    .stream(false)
    .format(schema)
    .build();

Tool Calling

Define tools the model can invoke during a chat conversation:

use ollama_rs::types::chat::{ChatRequest, Function, Message, Tool, ToolType};
use serde_json::json;

let tools = vec![Tool {
    tool_type: ToolType::Function,
    function: Function {
        name: "get_weather".to_string(),
        description: "Get the current weather for a city.".to_string(),
        parameters: json!({
            "type": "object",
            "properties": {
                "city": { "type": "string", "description": "The name of the city" }
            },
            "required": ["city"]
        }),
    },
}];

let request = ChatRequest::builder("llama3:8b")
    .messages(vec![Message::user("What is the weather in Paris?")])
    .stream(false)
    .tools(tools)
    .build();

When the model decides to call a tool, the response message.tool_calls field will contain the tool name and arguments. You can then execute the function and send the result back via Message::tool_response(...).

API Reference

`OllamaClient`

Method	Description
`new(server_address)`	Create a new client pointing at an Ollama server
`version()`	Get the Ollama server version
`tags()`	List all available models
`ps()`	List currently running/loaded models
`generate(request)`	Generate text (streaming)
`chat(request)`	Chat conversation (streaming)
`pull(request)`	Pull/download a model (streaming)

Request Builders

GenerateRequest::builder(model) -- .prompt(), .system_prompt(), .format(), .options(), .stream(), .think(), .images(), .suffix()

ChatRequest::builder(model) -- .messages(), .tools(), .format(), .options(), .stream()

PullRequest::builder(model) -- .stream()

Generation Options

Configure sampling parameters via Options::builder():

Option	Description
`temperature(f32)`	Controls randomness (0.0 - 2.0)
`top_k(u32)`	Top-K sampling
`top_p(f32)`	Nucleus sampling threshold
`min_p(f32)`	Minimum probability filter
`seed(u64)`	Random seed for reproducibility
`num_ctx(u32)`	Context window size
`num_predict(u32)`	Maximum tokens to generate
`stop(Stop)`	Stop sequences

Examples

The examples/ directory contains runnable programs:

Example	Description
`generate`	Basic text generation
`chat`	Interactive multi-turn chat
`structured_output`	JSON structured output with schema
`tool_call`	Function calling / tool use
`pull`	Download a model
`tags`	List available models
`ps`	List running models
`version`	Query server version

Run an example:

OLLAMA_SERVER=http://localhost:11434 cargo run --example chat

Configuration

Environment Variable	Description
`OLLAMA_SERVER`	Ollama server address (e.g., `http://localhost:11434`)
`RUST_LOG`	Log level filter (e.g., `ollama_rs=debug`)