Slashing agent token costs by 98% with RFC 9457-compliant error responses
0 net
Tags
Slashing agent token costs by 98% with RFC 9457-compliant error responses Slashing agent token costs by 98% with RFC 9457-compliant error responses 2026-03-11 Sam Marsh 7 min read AI agents are no longer experiments. They are production infrastructure, making billions of HTTP requests per day, navigating the web, calling APIs, and orchestrating complex workflows. But when these agents hit an error, they still receive the same HTML error pages we built for browsers: hundreds of lines of markup, CSS, and copy designed for human eyes. Those pages give agents clues, not instructions, and waste time and tokens. That gap is the opportunity to give agents instructions, not obstacles. Starting today, Cloudflare returns RFC 9457 -compliant structured Markdown and JSON error payloads to AI agents, replacing heavyweight HTML pages with machine-readable instructions. That means when an agent sends Accept: text/markdown , Accept: application/json , or Accept: application/problem+json and encounters a Cloudflare error, we return one semantic contract in a structured format instead of HTML. And it comes complete with actionable guidance. (This builds on our recent Markdown for Agents release.) So instead of being told only "You were blocked," the agent will read: "You were rate-limited â wait 30 seconds and retry with exponential backoff." Instead of just "Access denied," the agent will be instructed: "This block is intentional: do not retry, contact the site owner." These responses are not just clearer â they are dramatically more efficient. Structured error responses cut payload size and token usage by more than 98% versus HTML, measured against a live 1015 ('rate-limit') error response. For agents that hit multiple errors in a workflow, the savings compound quickly. This is live across the Cloudflare network, automatically. Site owners do not need to configure anything. Browsers keep getting the same HTML experience as before. These are not just error pages. They are instructions for the agentic web. What agents see today When an agent receives a Cloudflare-generated error, it usually means Cloudflare is enforcing customer policy or returning a platform response on the customer's behalf â not that Cloudflare is down. These responses are triggered when a request cannot be served as-is, such as invalid host or DNS routing, customer-defined access controls (WAF, geo, ASN, or bot rules), or edge-enforced limits like rate limiting. In short, Cloudflare is acting as the customer's routing and security layer, and the response explains why the request was blocked or could not proceed. Today, those responses are rendered as HTML designed for humans: Access denied | example.com used Cloudflare to restrict access To an agent, this is garbage. It cannot determine what error occurred, why it was blocked, or whether retrying will help. Even if it parses the HTML, the content describes the error but doesn't tell the agent â or the human, for that matter â what to do next. If you're an agent developer and you wanted to handle Cloudflare errors gracefully, your options were limited. For Cloudflare-generated errors, structured responses existed only in configuration-dependent paths, not as a consistent default for agents. Custom Error Rules can customize many Cloudflare errors, including some 1xxx cases. But they depend on per-site configuration, so they cannot serve as a universal agent contract across the web. Cloudflare sits in front of the request path. That means we can define a default machine response: retry or stop, wait and back off, escalate or reroute. Error pages stop being decoration and become execution instructions. What we did Cloudflare now returns RFC 9457-compliant structured responses for all 1xxx-class error paths â Cloudflare's platform error codes for edge-side failures like DNS resolution issues, access denials, and rate limits. Both formats are live: Accept: text/markdown returns Markdown, Accept: application/json returns JSON, and Accept: application/problem+json returns JSON with the application/problem+json content type. This covers all 1xxx-class errors today. The same contract will extend to Cloudflare-generated 4xx and 5xx errors next. Markdown responses have two parts: YAML frontmatter for machine-readable fields prose sections for explicit guidance ( What happened and What you should do ) JSON responses carry the same fields as a flat object. The YAML frontmatter is the critical layer for automation. It lets an agent extract stable keys without scraping HTML or guessing intent from copy. Fields like error_code , error_name , and error_category let the agent classify the failure. retryable and retry_after drive backoff logic. owner_action_required tells the agent whether to keep trying or escalate. ray_id , timestamp , and zone make logs and support handoffs deterministic. The schema is stable by design, so agents can implement durable control flow without chasing presentation changes. That stability is not a Cloudflare invention. RFC 9457 â Problem Details for HTTP APIs defines a standard JSON shape for reporting errors over HTTP, so clients can parse error responses without knowing the specific API in advance. Our JSON responses follow this shape, which means any HTTP client that understands Problem Details can parse the base members without Cloudflare-specific code: RFC 9457 member What it contains type A URI pointing to Cloudflare's documentation for the specific error code status The HTTP status code (matching the actual response status) title A short, human-readable summary of the problem detail A human-readable explanation specific to this occurrence instance The Ray ID identifying this specific error occurrence The operational fields â error_code , error_category , retryable , retry_after , owner_action_required , and more â are RFC 9457 extension members. Clients that don't recognize them simply ignore them. This is network-wide and additive. Site owners do not need to configure anything. Browsers keep receiving HTML unless clients explicitly ask for Markdown or JSON. What the response looks like Here is what a rate-limit error ( 1015 ) looks like in JSON: { "type": "https://developers.cloudflare.com/support/troubleshooting/http-status-codes/cloudflare-1xxx-errors/error-1015/", "title": "Error 1015: You are being rate limited", "status": 429, "detail": "You are being rate-limited by the website owner's configuration.", "instance": "9d99a4434fz2d168", "error_code": 1015, "error_name": "rate_limited", "error_category": "rate_limit", "ray_id": "9d99a4434fz2d168", "timestamp": "2026-03-09T11:11:55Z", "zone": "", "cloudflare_error": true, "retryable": true, "retry_after": 30, "owner_action_required": false, "what_you_should_do": "**Wait and retry.** This block is transient. Wait at least 30 seconds, then retry with exponential backoff.\n\nRecommended approach:\n1. Wait 30 seconds before your next request\n2. If rate-limited again, double the wait time (60s, 120s, etc.)\n3. If rate-limiting persists after 5 retries, stop and reassess your request pattern", "footer": "This error was generated by Cloudflare on behalf of the website owner." } The same error in Markdown, optimized for model-first workflows: --- error_code: 1015 error_name: rate_limited error_category: rate_limit status: 429 ray_id: 9d99a39dc992d168 timestamp: 2026-03-09T11:11:28Z zone: cloudflare_error: true retryable: true retry_after: 30 owner_action_required: false --- # Error 1015: You are being rate limited ## What Happened You are being rate-limited by the website owner's configuration. ## What You Should Do **Wait and retry.** This block is transient. Wait at least 30 seconds, then retry with exponential backoff. Recommended approach: 1. Wait 30 seconds before your next request 2. If rate-limited again, double the wait time (60s, 120s, etc.) 3. If rate-limiting persists after 5 retries, stop and reassess your request pattern --- This error was generated by Cloudflare on behalf of the website owner. Both formats give an agent everything it needs to decide and act: classify the error, choose retry behavior, and determine whether escalation is required. This is what a default machine contract looks like â not per-site configuration, but network-wide behavior. The contrast is explicit across error families: a transient error like 1015 says wait and retry, while intentional blocks like 1020 or geographic restrictions like 1009 tell the agent not to retry and to escalate instead. One contract, two formats The core value is not format choice. It is semantic stability. Agents need deterministic answers to operational questions: retry or not, how long to wait, and whether to escalate. Cloudflare exposes one policy contract across two wire formats. Whether a client consumes Markdown or JSON, the operational meaning is identical: same error identity, same retry/backoff signals, same escalation guidance. Clients that send Accept: application/problem+json get application/problem+json; charset=utf-8 back â useful for HTTP client libraries that dispatch on media type. Clients that send Accept: application/json get application/json; charset=utf-8 â same body, safe default for existing consumers. Size reduction and token efficiency That contract is also dramatically smaller than what it replaces. Cloudflare HTML error pages are browser-oriented and heavy, while structured responses are compact by design. Measured comparison for 1015 : Payload Bytes Tokens (cl100k_base) Size vs HTML Token vs HTML HTML response 46,645 14,252 â â Markdown response 798 221 58.5x less 64.5x less JSON response 970 256 48.1x less 55.7x less Both structured formats deliver a ~98% reduction in size and tokens versus HTML. For agents, size translates directly into token cost â when an agent hits multiple errors in one run, these savings compound into lower model spend and faster recovery loops. Ten categories, clear actions Every 1xxx error is mapped to an error_category . That turns error handling into routing logic instead of brittle per-page parsing. Category What it means What the agent should do access_denied Intentional block: IP, ASN, geo, firewall rule Do not retry. Contact site owner if unexpected. rate_limit Request rate exceeded Back off. Retry after retry_after seconds. dns DNS resolution failure at the origin Do not retry. Report to site owner. config Configuration error: CNAME, tunnel, host routing Do not retry (usually). Report to site owner. tls TLS version or cipher mismatch Fix TLS client settings. Do not retry as-is. legal DMCA or regulatory block Do not retry. This is a legal restriction. worker Cloudflare Workers runtime error Do not retry. Site owner must fix the script. rewrite Invalid URL rewrite output Do not retry. Site owner must fix the rule. snippet Cloudflare Snippets error Do not retry. Site owner must fix Snippets config. unsupported Unsupported method or deprecated feature Change the request. Do not retry as-is. Two fields make this operationally useful for agents: retryable answers whether a retry can succeed owner_action_required answers whether the problem must be escalated You can replace brittle "if status == 429 then maybe retry" heuristics with explicit control flow. Parse the frontmatter once, then branch on stable fields. A simple pattern is: if retryable is true , wait retry_after and retry if owner_action_required is true , stop and escalate otherwise, fail fast without hammering the site Here is a minimal Python example using that pattern: import time import yaml def parse_frontmatter(markdown_text: str) -> dict: # Expects: ---\n\n---\n if not markdown_text.startswith("---\n"): return {} _, yaml_block, _ = markdown_text.split("---\n", 2) return yaml.safe_load(yaml_block) or {} def handle_cloudflare_error(markdown_text: str) -> str: meta = parse_frontmatter(markdown_text) if not meta.get("cloudflare_error"): return "not_cloudflare_error" if meta.get("retryable"): wait_seconds = int(meta.get("retry_after", 30)) time.sleep(wait_seconds) return f"retry_after_{wait_seconds}s" if meta.get("owner_action_required"): return f"escalate_owner_error_{meta.get('error_code')}" return "do_not_retry" This is the key shift: agents are no longer inferring intent from HTML copy. They are executing explicit policy from structured fields. How to use it Send Accept: text/markdown , Accept: application/json , or Accept: application/problem+json . For quick testing, you can hit any Cloudflare-proxied domain directly at /cdn-cgi/error/1015 (or replace 1015 with another 1xxx code). curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" Example with another error code: curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1020" JSON example: curl -s --compressed -H "Accept: application/json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq . RFC 9457 Problem Details example: curl -s --compressed -H "Accept: application/problem+json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq . The behavior is deterministic â the first explicit structured type wins: Accept header Response application/json JSON application/json; charset=utf-8 JSON application/problem+json JSON (application/problem+json content type) application/json, text/markdown;q=0.9 JSON application/json, text/markdown JSON (equal q, first-listed wins) text/markdown Markdown text/markdown, application/json Markdown (equal q, first-listed wins) text/markdown, */* Markdown text/* Markdown */* HTML (default) Wildcard-only requests ( */* ) do not signal a structured preference; clients must explicitly request Markdown or JSON. If the request succeeds, you get normal origin content. The header only affects Cloudflare-generated error responses. Real-world use cases There are a number of situations where structured error responses help immediately: Agent blocked by WAF rule ( 1020 ). The agent parses error_code , records ray_id , and stops retrying. It can escalate with useful context instead of looping. MCP (Model Context Protocol) tool hitting geo restriction ( 1009 ). The tool gets a clear, machine-readable reason, returns it to the orchestrator, and the workflow can choose an alternate path or notify the user. Rate-limited crawler ( 1015 ). The agent reads retryable : true and retry_after , applies backoff, and retries predictably instead of hammering the endpoint. Developer debugging with curl . The developer can reproduce exactly what the agent sees, including frontmatter and guidance, without reverse-engineering HTML. HTTP client libraries that understand RFC 9457. Any client that dispatches on application/problem+json or parses Problem Details objects can handle Cloudflare errors without Cloudflare-specific code. In each case, the outcome is the same: less guessing, fewer wasted retries, lower model cost, and faster recovery. Try it now Send a structured Accept header and test against any Cloudflare-proxied domain: curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" curl -s --compressed -H "Accept: application/json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq . curl -s --compressed -H "Accept: application/problem+json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq . Error pages are the first conversation between Cloudflare and an agent. This launch makes that conversation structured, standards-compliant, and cheap to process. To make this work across the web, agent runtimes should default to explicit structured Accept headers, not bare */* . Use Accept: text/markdown, */* for model-first workflows and Accept: application/json, */* for typed control flow. If you maintain an agent framework, SDK, or browser automation stack, ship this default and treat bare */* as legacy fallback. And it is only the first layer. We are building the rest of the agent stack on top of it: AI Gateway for routing, controls, and observability; Workers AI for inference; and the identity, security, and access primitives agents will need to operate safely at Internet scale. Cloudflare is helping our customers deliver content in agent-friendly ways, and this is just the start. If you're building or operating agents, start at agents.cloudflare.com . Cloudflare's connectivity cloud protects entire corporate networks , helps customers build Internet-scale applications efficiently , accelerates any website or Internet application , wards off DDoS attacks , keeps hackers at bay , and can help you on your journey to Zero Trust . Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer. To learn more about our mission to help build a better Internet, start here . If you're looking for a new career direction, check out our open positions . server-island-start AI Developer Platform Developers WAF Edge Computing Follow on X Cloudflare | @cloudflare Related posts March 11, 2026 1:00 PM AI Security for Apps is now generally available Cloudflare AI Security for Apps is now generally available, providing a security layer to discover and protect AI-powered applications, regardless of the model or hosting provider. We are also making AI discovery free for all plans, to help teams find and secure shadow AI deployments. ... By  Liam Reese , Zhiyuan Zheng , Catherine Newcomb Product News ,  AI ,  WAF ,  Security ,  Application Security ,  Application Services  March 04, 2026 3:00 PM Always-on detections: eliminating the WAF âlog versus blockâ trade-off Cloudflare is introducing Attack Signature Detection and Full-Transaction Detection to provide continuous, high-fidelity security insights without the manual tuning of traditional WAFs. By correlating request payloads with server responses, we can now identify successful exploits and data exfiltration while minimizing false positives. ... By  Daniele Molteni WAF ,  WAF Rules ,  Managed Rules ,  Vulnerabilities ,  Security Analytics  March 02, 2026 6:00 AM The truly programmable SASE platform As the only SASE platform with a native developer stack, weâre giving you the tools to build custom, real-time security logic and integrations directly at the edge. ... By  Abe Carryl Cloudflare One ,  Zero Trust ,  SASE ,  Developer Platform ,  Cloudflare Workers  February 27, 2026 6:00 AM The most-seen UI on the Internet? Redesigning Turnstile and Challenge Pages We serve 7.6 billion challenges daily. Hereâs how we used research, AAA accessibility standards, and a unified architecture to redesign the Internetâs most-seen user interface. ... By  Leo Bacevicius , Ana Foppa , Marina Elmore Security Week ,  Turnstile ,  Challenge Page ,  Design ,  Product Design ,  User Research ,  Bots ,  Bot Management ,  WAF ,  Engineering ,  Product News ,  Accessibility Â