← Back to blog
·5 min read

Why AI Agents Need Your Site to Speak Markdown

Most websites dump 500KB of HTML on AI agents when 2KB of Markdown would do. Content negotiation fixes this. Here's how it works and why you should care.

agentscontent-negotiationmarkdownllm

You optimize for Google. You optimize for mobile. But when an AI agent visits your site, what does it actually get back?

For most websites: a wall of HTML. Nav bars, cookie banners, script tags, tracking pixels. The actual content the agent needs is buried somewhere in the middle, maybe 5-10% of the total payload.

That's a problem. And it's getting worse.

Agents don't render your site. They read it.

When a human visits a webpage, the browser throws away all the noise. It renders the CSS, runs the JavaScript, and presents a clean visual layout. The <nav> goes at the top. The footer stays at the bottom. You never think about the 200 lines of <head> content.

Agents don't get that luxury. They receive the raw response and have to figure out what matters. Every token counts against their context window, and most of what you're sending them is useless.

The Vercel team put some numbers on this: a typical blog post weighs around 500KB with all the HTML, CSS, and JavaScript. The same content as Markdown? 2KB. That's a 99.6% reduction in payload size.

Warning

Think about what that means at scale. An agent researching a topic might hit 20 pages. At 500KB each, that's 10MB of context burned on navigation markup and bundle scripts. The same research in Markdown: 40KB.

Content negotiation is not new

HTTP has supported content negotiation since the early days. The Accept header lets a client tell the server what format it wants. Browsers send Accept: text/html. Image tags send Accept: image/*. This is how the web has always worked.

The new part is agents using it. Claude Code, for example, sends:

Accept: text/markdown, text/html, */*

It's asking for Markdown first, HTML as a fallback, and anything else as a last resort. If your server respects this header, it can return clean, structured Markdown instead of a full HTML document.

HTTP/1.1 200 OK
Content-Type: text/markdown; charset=utf-8
Vary: Accept

---
title: My Blog Post
date: 2026-03-15
author: Jane Smith
---

# My Blog Post

The actual content, with no wrapper markup.

No navigation. No scripts. No tracking. Just content with metadata in frontmatter. Same information, fraction of the tokens.

Note

The Vary: Accept header tells caches that the response changes based on the Accept header. Without it, a CDN might cache the Markdown version and serve it to browsers, or vice versa.

What good content negotiation looks like

David Cramer wrote about how Sentry approaches this. A few things stood out.

Strip the browser chrome. Navigation, sidebars, JavaScript, cookie banners. None of it helps an agent. When you detect a Markdown request, serve only the content.

Restructure your link hierarchy. Index pages can become sitemaps. Instead of rendering a grid of cards with thumbnails, return a structured list of links with descriptions. Agents read partial responses (the first N lines) to decide if they need the rest. Put the most important information first.

Think about what agents do next. If a page requires authentication, don't just show a login form. Tell the agent that auth is required and point it to the API or CLI alternative. Agents behave differently when you tell them information exists versus making them discover it on their own.

The Vercel team takes a similar approach. Their implementation uses Next.js middleware to check the Accept header, then routes Markdown requests to handlers that convert their CMS content into clean Markdown. The key detail: code blocks keep their syntax highlighting markers, headings maintain hierarchy, and links stay functional. Structure matters, not just raw text.

This is not llms.txt

You might have seen the llms.txt proposal floating around. It's a static file (like robots.txt) that describes your site for agents. It solves a different problem.

  • robots.txt controls access. Which pages can be crawled.
  • llms.txt provides a site overview. What your product does, where the docs live.
  • Content negotiation controls format. What agents receive when they fetch a specific page.

They complement each other. llms.txt helps agents understand your site. Content negotiation helps them consume your pages efficiently.

Why this matters right now

Agent traffic is real and growing. ChatGPT browsing, Perplexity, Claude, and dozens of custom agent frameworks are hitting websites every day. If your server ignores the Accept: text/markdown header and dumps raw HTML, agents are forced to parse DOM soup, bloating their context and burning tokens on noise.

The sites that serve agents well will be the ones agents recommend, cite, and return to. This isn't theoretical. It's the same dynamic as SEO, just for a different kind of reader.

And the implementation is straightforward. Check the Accept header. If it includes text/markdown, strip the page down to its content, add frontmatter metadata, and return it with the right Content-Type. That's it.

Test where you stand

This is what we built Agent Audit to do. Enter a URL, and we send the same Accept: text/markdown header that AI agents use. We compare the HTML and Markdown responses, measure the context bloat, check for agent discovery files, and score your site on how well it serves agents.

It's free. No signup. Takes about 30 seconds. Run your first audit and see what agents actually see when they visit your site.