April 13, 2026·4 min read

Serving agents clean content is an environmental issue, not just a performance one

Every unnecessary token your site forces an AI agent to process burns real energy. A 400KB HTML page vs 15KB of markdown is not just a latency difference. It's a carbon one.

agentscontent-negotiationmarkdownllm

AI inference, not training, now accounts for more than 90% of the energy consumed by LLM services. Every query. Every page fetch. Every token processed through a model costs electricity, and that electricity is mostly still coming from fossil fuels.

When an AI agent visits your website and gets back a wall of HTML, it doesn't just burn more of its context window. It burns more compute. The tokens it spends parsing <nav> blocks and cookie consent scripts are not free. They have a measurable energy cost, and they add up.

The math on wasted tokens

A typical blog post weighs around 500KB as HTML. The same content as Markdown comes in around 2KB. That 99%+ reduction isn't just a size difference. It's a token difference.

Converting HTML to Markdown reduces token consumption by an average of 60–88% depending on the page type. For tables, HTML can be 3–5x more token-heavy than Markdown. For full pages with nav, scripts, and footers, the ratio is often worse.

Tokens per joule is becoming a real metric. Research from EuroMLSys benchmarks model efficiency in terms of how much useful output you get per watt of power consumed. The implication is the inverse: every token you consume without producing useful output is pure waste. When an agent processes 40,000 tokens of HTML to extract 2,000 tokens of content, 95% of that energy went nowhere.

Note

Inference costs have dropped dramatically: what cost $60 per million tokens in 2021 costs around $0.06 today. But the scale has grown proportionally. Agentic workflows that chain multiple page fetches, research tasks, and tool calls can consume hundreds of thousands of tokens per session.

Data centers are not a rounding error

U.S. data centers consumed 183 terawatt-hours of electricity in 2024, more than 4% of total U.S. electricity consumption. That figure is projected to grow to 426 TWh by 2030. The IEA estimates AI-related data center emissions will reach 1% of global CO₂ by 2030 in its central scenario.

To put the water footprint in context: by 2030, data centers are projected to drain 731 to 1,125 million cubic meters of water per year for cooling. That's the annual household water use of 6 to 10 million Americans.

None of that is caused solely by token inefficiency. Hardware choice, grid carbon intensity, and cooling infrastructure all matter more. But token efficiency is the one lever your website controls directly.

Agentic workflows multiply the effect

A single agent research task doesn't hit one page. It hits 10, 20, sometimes more. For each page, it fetches content, extracts signal, and potentially follows links to additional pages. The wasted tokens aren't static. They compound across the task.

Our audit crawler visits up to 10 pages per site. When a site serves HTML to markdown requests, we're processing roughly 10x more tokens than a site that implements content negotiation correctly. That difference is visible in our own infrastructure costs, but the same math applies to every agent that visits those sites across the open web.

At scale, the aggregate effect is real. The agent ecosystem (Claude, ChatGPT browsing, Perplexity, custom research agents) processes billions of page fetches. If those pages are serving 50x more tokens than necessary, the embedded energy cost is proportionally higher.

Warning

This isn't hypothetical. ChatGPT had over 400 million weekly active users as of early 2025. If even a fraction of those sessions involve web browsing, and the pages visited serve HTML instead of Markdown, the aggregate token waste is enormous.

Efficiency and sustainability are the same argument

There's a tendency to frame this as a cost and performance conversation: fewer tokens, faster responses, lower API bills. All of that is true. But efficiency at infrastructure scale is also an environmental argument.

The web has spent 30 years optimizing for human browsers. Lazy-loading images, minifying scripts, caching static assets: these practices exist because sending unnecessary bytes has a cost. Content negotiation for AI agents is the same principle applied to a new kind of reader.

Agents are reading the web at scale. The format you serve them determines how much compute gets spent extracting signal from your pages. Serving clean Markdown isn't just a technical best practice. It's the lower-carbon option.