A critical remote code execution (RCE) vulnerability, tracked as CVE-2026-5760 with a CVSS score of 9.8, has been disclosed in SGLang, a widely used open-source AI model serving framework.
The flaw allows attackers to weaponize a malicious GGUF model file to execute arbitrary Python code on the target server through the framework’s reranking endpoint, with no patch currently available.
SGLang is a high-performance, open-source framework designed for serving large language models (LLMs) and multimodal AI models, with official support for prominent models including Qwen, DeepSeek, Mistral, and Skywork, all via OpenAI-compatible APIs.
The project has accumulated over 26,100 GitHub stars and more than 5,500 forks, reflecting widespread adoption across global AI infrastructure deployments.
Its integration into enterprise-scale AI pipelines makes this critical vulnerability especially alarming, as a single compromised SGLang deployment could serve as a gateway to the entire AI-serving infrastructure.
The vulnerability was discovered and reported by security researcher Stuart Beck, and the CERT Coordination Center (CERT/CC) published a formal advisory on April 20, 2026, assigning it CERT Vulnerability Note VU#915947.
Despite coordinated disclosure efforts, the SGLang maintainers did not respond, and no official patch has been released as of publication.
CVE-2026-5760: Critical SGLang RCE Flaw
The root cause of CVE-2026-5760 lies in the use of jinja2.Environment() SGLang’s function without sandboxing, an internal function responsible for rendering Jinja2 chat templates embedded within GGUF model files.
Because jinja2.Environment() permits unrestricted execution of arbitrary Python code; any malicious Jinja2 payload embedded in a model’s tokenizer.chat_template metadata field is executed directly on the inference server without restriction.
The attack chain is precise and reproducible. An attacker first crafts a malicious GGUF model file embedding a Jinja2 Server-Side Template Injection (SSTI) payload, along with the Qwen3 reranker trigger phrase needed to activate the vulnerable code path inside entrypoints/openai/serving_rerank.py.
When a victim downloads and loads this poisoned model, for example, from a model repository like Hugging Face, and subsequently sends any HTTP request to the /v1/rerank endpoint, SGLang renders the malicious template using the unsandboxed jinja2.Environment(), causing the attacker’s Python code to execute on the server.
Security Research has previously documented this attack class, noting that Jinja2 templates embedded in GGUF model metadata can contain payloads such as class traversal chains (e.g., ().__class__.__base__.__subclasses__()) that call OS-level commands when rendered in an unsandboxed context.
The technique requires no authentication and no special privileges beyond the ability to supply the malicious model to the victim.
Exploit Impact and Risk Scope
The team assessed the vulnerability at the maximum severity tier, with a CVSS v3 base score of 9.8 (Critical), reflecting the combination of network exploitability, no authentication requirement, and high impact on confidentiality, integrity, and availability.
Successful exploitation could enable full host compromise, lateral movement across internal networks, sensitive data exfiltration, and denial-of-service (DoS) conditions against the SGLang service.
The highest-risk deployments are those that expose the /v1/rerank endpoint directly to untrusted or public networks. Organizations using SGLang for document retrieval pipelines, AI-assisted search, or cross-encoder reranking in RAG (Retrieval-Augmented Generation) architectures are particularly exposed.
The vulnerability is analogous to two earlier critical flaws in the same vulnerability class: CVE-2024-34359 (aka “Llama Drama,” CVSS 9.7) affecting llama_cpp_python, and CVE-2025-61620 (CVSS 6.5) affecting vLLM, both of which involved unsandboxed Jinja2 rendering in AI model serving environments.
CVE-2026-5760
- Affected Component:
/v1/rerankendpoint in SGLang’s reranking module - Vulnerability Class: Jinja2 Server-Side Template Injection (SSTI) → Remote Code Execution (RCE)
- Attack Vector: Malicious GGUF model file with crafted
tokenizer.chat_templatemetadata - Trigger Mechanism: Qwen3 reranker trigger phrase in
entrypoints/openai/serving_rerank.py - Root Cause Function:
get_jinja_env()using unsandboxedjinja2.Environment() - CVSS Score: 9.8 (Critical)
- Authentication Required: None
- Patch Status: No patch available; no maintainer response
Mitigation Strategies
With no official patch available, CERT/CC recommends replacing jinja2.Environment() with ImmutableSandboxedEnvironment when rendering chat templates, which prevents the execution of arbitrary Python code by enforcing strict template sandboxing.
Until a patch is issued, organizations should immediately restrict network access to the /v1/rerank endpoint, blocking all requests from untrusted networks or external-facing interfaces.
Security teams should also audit their model ingestion pipelines and only load GGUF models sourced from verified, trusted repositories. JFrog recommends inspecting the tokenizer.chat_template metadata field in all GGUF models for suspicious strings, including __class__, os, subprocess, eval, and exec before loading them into any serving framework.
Organizations can leverage JFrog’s GGUF metadata scanning capabilities or implement custom pre-load inspection scripts to detect malicious templates before they reach the inference server.
Additionally, applying network segmentation, enforcing strict input validation on endpoint access, and implementing runtime application self-protection (RASP) monitoring on SGLang deployments can reduce exposure and improve detection of active exploitation attempts.
Given that SGLang maintainers have not responded to the disclosure, enterprise teams should treat this vulnerability as unpatched indefinitely until an official update is confirmed.
Frequently Asked Questions
Q1: What is CVE-2026-5760 in SGLang?
CVE-2026-5760 is a critical (CVSS 9.8) RCE vulnerability in SGLang’s /v1/rerank endpoint caused by unsandboxed Jinja2 template rendering in malicious GGUF model files.
Q2: How does the CVE-2026-5760 attack work?
An attacker embeds a Jinja2 SSTI payload in a GGUF model’s tokenizer.chat_template metadata field; when the victim loads the model and hits /v1/rerank, the payload executes arbitrary Python code on the server.
Q3: Is there a patch available for CVE-2026-5760?
No, as of April 20, 2026, no patch exists; SGLang maintainers did not respond to coordinated disclosure, so the recommended fix is replacing jinja2.Environment() with ImmutableSandboxedEnvironment.
Q4: Which SGLang deployments are most at risk from CVE-2026-5760?
Deployments exposing the /v1/rerank Endpoints to untrusted or public networks without authentication controls are at the highest risk, especially those used in RAG pipelines or AI document reranking workflows.
Site: http://thecybrdef.com