PullRequest now omits insecure and stream from serialized JSON when
unset, consistent with all other request types in the codebase.
Previously these fields serialized as null.
usize is platform-dependent (32-bit on 32-bit targets). Nanosecond
durations can exceed u32::MAX in ~4.3 seconds, and model sizes in
bytes can easily exceed 4 GiB. Using u64 ensures correctness across
all platforms.
Changed fields:
- GenerateResponse: total_duration, load_duration, prompt_eval_count,
prompt_eval_duration, eval_count, eval_duration
- Model: size
- RunningModel: size, size_vram
The Display trait should not append trailing newlines, as callers
like println! and error chaining libraries (anyhow, eyre) add their
own. Also cleaned up the display strings to use readable names
instead of raw variant names.
Previously, if serde_json::from_str failed on a streamed line, the
line was silently discarded and the caller had no indication that data
was lost. Now parse errors are yielded as OllamaError::ResponseParseError
so consumers can detect and handle unexpected API responses.
Empty lines from LinesCodec are skipped to avoid spurious parse errors
on blank input between chunks.
Store a shared reqwest::Client on OllamaClient instead of creating
a new one per request. Previously, version(), tags(), and ps() each
used reqwest::get() which allocates a one-shot client, and
stream_response() called reqwest::Client::new() on every invocation.
Since reqwest::Client manages an internal connection pool, reusing it
enables TCP and TLS connection reuse across calls. Cloning the client
is cheap as it is Arc-backed internally.