diff --git a/README.md b/README.md
new file mode 100644
index 0000000..d7d965f
--- /dev/null
+++ b/README.md
@@ -0,0 +1,211 @@
+# ollama-rs
+
+An async Rust client library for the [Ollama](https://ollama.com/) API. Provides a streaming-first interface for text generation, multi-turn chat, model management, and advanced features like structured output and tool calling.
+
+## Features
+
+- Fully async with [tokio](https://tokio.rs/) and streaming responses via `futures::Stream`
+- Text generation and multi-turn chat conversations
+- Structured JSON output with schema validation
+- Tool calling / function calling support
+- Model management (list, pull, inspect running models)
+- Builder pattern for constructing requests
+- Configurable generation parameters (temperature, top-k, top-p, and more)
+- Thinking / reasoning mode support
+
+## Installation
+
+Add `ollama-rs` to your `Cargo.toml`:
+
+```toml
+[dependencies]
+ollama-rs = { git = "https://github.com/andreban/ollama-rs.git" }
+tokio = { version = "1", features = ["full"] }
+futures-util = "0.3"
+```
+
+## Prerequisites
+
+A running [Ollama](https://ollama.com/) server. By default, Ollama listens on `http://localhost:11434`.
+
+## Quick Start
+
+### Text Generation
+
+```rust
+use std::io::Write;
+use futures_util::StreamExt;
+use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
+
+#[tokio::main]
+async fn main() {
+    let client = OllamaClient::new("http://localhost:11434");
+    let request = GenerateRequest::builder("llama3:8b")
+        .prompt("Why is the sky blue?")
+        .build();
+
+    let mut stream = client.generate(request);
+    while let Some(response) = stream.next().await {
+        match response {
+            Ok(token) => {
+                print!("{}", token.response);
+                std::io::stdout().flush().unwrap();
+                if token.done {
+                    break;
+                }
+            }
+            Err(e) => eprintln!("Error: {}", e),
+        }
+    }
+}
+```
+
+### Chat
+
+```rust
+use std::io::Write;
+use futures_util::StreamExt;
+use ollama_rs::{OllamaClient, types::chat::{ChatRequest, Message}};
+
+#[tokio::main]
+async fn main() {
+    let client = OllamaClient::new("http://localhost:11434");
+    let messages = vec![
+        Message::system("You are a helpful assistant."),
+        Message::user("What is the capital of France?"),
+    ];
+    let request = ChatRequest::builder("llama3:8b")
+        .messages(messages)
+        .build();
+
+    let mut stream = client.chat(request);
+    while let Some(response) = stream.next().await {
+        let response = response.unwrap();
+        print!("{}", response.message.content);
+        std::io::stdout().flush().unwrap();
+        if response.done {
+            break;
+        }
+    }
+}
+```
+
+### Structured Output
+
+Force the model to respond with JSON matching a specific schema:
+
+```rust
+use ollama_rs::{OllamaClient, types::generate::GenerateRequest};
+use serde_json::json;
+
+let schema = json!({
+    "type": "object",
+    "properties": {
+        "answer": { "type": "string" },
+        "confidence": { "type": "number" }
+    }
+});
+
+let request = GenerateRequest::builder("llama3:8b")
+    .prompt("What is 2 + 2?")
+    .stream(false)
+    .format(schema)
+    .build();
+```
+
+### Tool Calling
+
+Define tools the model can invoke during a chat conversation:
+
+```rust
+use ollama_rs::types::chat::{ChatRequest, Function, Message, Tool, ToolType};
+use serde_json::json;
+
+let tools = vec![Tool {
+    tool_type: ToolType::Function,
+    function: Function {
+        name: "get_weather".to_string(),
+        description: "Get the current weather for a city.".to_string(),
+        parameters: json!({
+            "type": "object",
+            "properties": {
+                "city": { "type": "string", "description": "The name of the city" }
+            },
+            "required": ["city"]
+        }),
+    },
+}];
+
+let request = ChatRequest::builder("llama3:8b")
+    .messages(vec![Message::user("What is the weather in Paris?")])
+    .stream(false)
+    .tools(tools)
+    .build();
+```
+
+When the model decides to call a tool, the response `message.tool_calls` field will contain the tool name and arguments. You can then execute the function and send the result back via `Message::tool_response(...)`.
+
+## API Reference
+
+### `OllamaClient`
+
+| Method | Description |
+|--------|-------------|
+| `new(server_address)` | Create a new client pointing at an Ollama server |
+| `version()` | Get the Ollama server version |
+| `tags()` | List all available models |
+| `ps()` | List currently running/loaded models |
+| `generate(request)` | Generate text (streaming) |
+| `chat(request)` | Chat conversation (streaming) |
+| `pull(request)` | Pull/download a model (streaming) |
+
+### Request Builders
+
+**`GenerateRequest::builder(model)`** -- `.prompt()`, `.system_prompt()`, `.format()`, `.options()`, `.stream()`, `.think()`, `.images()`, `.suffix()`
+
+**`ChatRequest::builder(model)`** -- `.messages()`, `.tools()`, `.format()`, `.options()`, `.stream()`
+
+**`PullRequest::builder(model)`** -- `.stream()`
+
+### Generation Options
+
+Configure sampling parameters via `Options::builder()`:
+
+| Option | Description |
+|--------|-------------|
+| `temperature(f32)` | Controls randomness (0.0 - 2.0) |
+| `top_k(u32)` | Top-K sampling |
+| `top_p(f32)` | Nucleus sampling threshold |
+| `min_p(f32)` | Minimum probability filter |
+| `seed(u64)` | Random seed for reproducibility |
+| `num_ctx(u32)` | Context window size |
+| `num_predict(u32)` | Maximum tokens to generate |
+| `stop(Stop)` | Stop sequences |
+
+## Examples
+
+The `examples/` directory contains runnable programs:
+
+| Example | Description |
+|---------|-------------|
+| `generate` | Basic text generation |
+| `chat` | Interactive multi-turn chat |
+| `structured_output` | JSON structured output with schema |
+| `tool_call` | Function calling / tool use |
+| `pull` | Download a model |
+| `tags` | List available models |
+| `ps` | List running models |
+| `version` | Query server version |
+
+Run an example:
+
+```sh
+OLLAMA_SERVER=http://localhost:11434 cargo run --example chat
+```
+
+## Configuration
+
+| Environment Variable | Description |
+|----------------------|-------------|
+| `OLLAMA_SERVER` | Ollama server address (e.g., `http://localhost:11434`) |
+| `RUST_LOG` | Log level filter (e.g., `ollama_rs=debug`) |