Four parallel jobs on push/PR to main:
- check: cargo check + examples
- test: cargo test
- clippy: cargo clippy with warnings as errors
- fmt: cargo fmt --check
Uses dtolnay/rust-toolchain for Rust setup and Swatinem/rust-cache
for dependency caching.
OllamaClient now applies a 30-second connection timeout by default,
so a down server fails fast instead of blocking indefinitely. No
request timeout is set since LLM responses can legitimately run for
minutes during model loading or long generations.
Added OllamaClient::builder() for custom configuration:
OllamaClient::builder("http://localhost:11434")
.connection_timeout(Duration::from_secs(60))
.build();
Also updated README.md to document the builder API, default()
constructor, tool_response return type change, and think support
in ChatRequest.
The Ollama API supports the think parameter for both generate and
chat endpoints. ChatRequest was missing it while GenerateRequest
already had it. Added the field, builder method, and doc comment
to bring chat to parity with generate.
Connects to http://localhost:11434, the standard local Ollama address.
Users can now use OllamaClient::default() for the common case instead
of always providing the server address explicitly.
Moved serde attributes on prompt and suffix to the conventional
position after doc comments. Changed // to /// on images and format
fields so they appear in generated documentation.
Replaced unwrap() with ? operator so serialization errors propagate
as OllamaError::ResponseParseError instead of panicking. This is
safer for a library API where callers should decide how to handle
errors.
The Ollama API expects the messages field to be present in chat
requests. Removed skip_serializing_if on messages so it serializes
as an empty array rather than being omitted entirely.
PullRequest now omits insecure and stream from serialized JSON when
unset, consistent with all other request types in the codebase.
Previously these fields serialized as null.
usize is platform-dependent (32-bit on 32-bit targets). Nanosecond
durations can exceed u32::MAX in ~4.3 seconds, and model sizes in
bytes can easily exceed 4 GiB. Using u64 ensures correctness across
all platforms.
Changed fields:
- GenerateResponse: total_duration, load_duration, prompt_eval_count,
prompt_eval_duration, eval_count, eval_duration
- Model: size
- RunningModel: size, size_vram
The Display trait should not append trailing newlines, as callers
like println! and error chaining libraries (anyhow, eyre) add their
own. Also cleaned up the display strings to use readable names
instead of raw variant names.
Previously, if serde_json::from_str failed on a streamed line, the
line was silently discarded and the caller had no indication that data
was lost. Now parse errors are yielded as OllamaError::ResponseParseError
so consumers can detect and handle unexpected API responses.
Empty lines from LinesCodec are skipped to avoid spurious parse errors
on blank input between chunks.
Store a shared reqwest::Client on OllamaClient instead of creating
a new one per request. Previously, version(), tags(), and ps() each
used reqwest::get() which allocates a one-shot client, and
stream_response() called reqwest::Client::new() on every invocation.
Since reqwest::Client manages an internal connection pool, reusing it
enables TCP and TLS connection reuse across calls. Cloning the client
is cheap as it is Arc-backed internally.