The Cloudflare Blog

Extract audio from your videos with Cloudflare Stream

Pakhi Sinha — Thu, 06 Nov 2025 14:00:00 GMT

Cloudflare Stream loves video. But we know not every workflow needs the full picture, and the popularity of podcasts highlights how compelling stand-alone audio can be. For developers, processing a video just to access audio is slow, costly, and complex.

What makes video so expensive? A video file is a dense stack of high-resolution images, stitched together over time. As such, it is not just “one file” — it’s a container of high-dimensional data such as frames per second, resolution, codecs. Analyzing video means traversing time resolution frame rate.

Why audio extraction

By comparison, an audio file is far simpler. If an audio file consists of only one channel, it is defined as a single waveform. The technical characteristics of this waveform are defined by the sample rate (the number of audio samples taken per second), and the bit depth (the precision of each sample).

With the rise of computationally intensive AI inference pipelines, many of our customers want to perform downstream workflows that require only analyzing the audio. For example:

Power AI and Machine Learning: In addition to translation and transcription, you can feed the audio into Voice-to-Text models for speech recognition or analysis, or AI-powered summaries.
Improve content moderation: Analyze the audio within your videos to ensure the content is safe and compliant.

Using video data in such cases is expensive and unnecessary.

That’s why we’re introducing audio extraction. Through this feature, with just a single API call or click in the dashboard, you can now extract a lightweight M4A audio track from any video.

We’re introducing two flexible methods to extract audio from your videos.

1. On-the-Fly audio extraction through Media Transformations

Media Transformations is perfect for processing and transforming short-form videos, like social media clips, that you store anywhere you'd like. It works by fetching your media directly from its source, optimizing it at our edge, and delivering it efficiently.

We extended this workflow to include audio. By simply adding mode=audio to the transformation URL, you can now extract audio on-the-fly from a video file stored anywhere.

Once Media Transformations is enabled for your domain, you can extract audio from any source video. You can even clip specific sections by specifying time and duration.

For example:

https://example.com/cdn-cgi/media/mode=audio,time=5s,duration=10s/

The above request generates a 10 second M4A audio clip from the source video, beginning at the 5-second mark. You can learn more about setup and other options in the Media Transformations documentation.

2. Audio downloads

You can now download the audio track directly for any content that you manage within Stream. Alongside the ability to generate a downloadable MP4 for offline viewing, you can also now create and store a persistent M4A audio file.

Workers AI demo

Here, you can see a sample piece of code that demonstrates how to use Media Transformations with one of Cloudflare’s own products — Workers AI. The following code creates a two-step process: first transcribing the video’s audio to English, then translating it into Spanish.

export default {
 async fetch(request, env, ctx) {


   // 1. Use Media Transformations to fetch only the audio track
   const res = await fetch( "https://blog.cloudflare.com/cdn-cgi/media/mode=audio/https://pub-d9fcbc1abcd244c1821f38b99017347f.r2.dev/announcing-audio-mode.mp4" );
   const blob = await res.arrayBuffer();


   // 2. Transcribe the audio to text using Whisper
   const transcript_response = await env.AI.run(
     "@cf/openai/whisper-large-v3-turbo",
     {
       audio: base64Encode(blob), // A base64 encoded string is required by @cf/openai/whisper-large-v3-turbo
     }
   );


   // Check if transcription was successful and text exists
   if (!transcript_response.text) {
       return Response.json({ error: "Failed to transcribe audio." }, { status: 500 });
   }


   // 3. Translate the transcribed text using the M2M100 model
   const translation_response = await env.AI.run(
     '@cf/meta/m2m100-1.2b',
     {
       text: transcript_response.text,
       source_lang: 'en', // The source language (English)
       target_lang: 'es'  // The target language (Spanish)
     }
   );


   // 4. Return both the original transcription and the translation
    return Response.json({
        transcription: transcript_response.text,
        translation: translation_response.translated_text
    });


 }
};


export function base64Encode(buf) {
 let string = '';
 (new Uint8Array(buf)).forEach(
   (byte) => { string += String.fromCharCode(byte) }
 )
 return btoa(string)
}

After running, the worker returns a clean JSON response. Shown below is a snippet of the transcribed and then translated response the worker returned.

Transcription:

{
  "transcription": "I'm excited to announce that Media Transformations from Cloudflare has added audio-only mode. Now you can quickly extract and deliver just the audio from your short form video. And from there, you can transcribe it or summarize it on Worker's AI or run moderation or inference tasks easily.",
  "translation": "Estoy encantado de anunciar que Media Transformations de Cloudflare ha añadido el modo solo de audio. Ahora puede extraer y entregar rápidamente sólo el audio de su vídeo de forma corta. Y desde allí, puede transcribirlo o resumirlo en la IA de Worker o ejecutar tareas de moderación o inferencia fácilmente."
}

Technical details

As a summer intern on the Stream team, I worked on shipping this long-requested feature. My first step was to understand the complex architecture of Stream’s media pipelines.

When a video is processed by Stream, it can follow one of two paths. The first is our video-on-demand (VOD) pipeline, which handles videos directly uploaded to Stream. It generates and stores a set of encoded video segments for adaptive bitrate streaming that can be streamed via HLS/DASH. The other path is our on-the-fly-encoding (or OTFE) pipeline, that drives the Stream Live and Media Transformations service. Instead of pre-processing and storing files, OTFE fetches media from a customer’s own website and performs transformations at the edge.

My project involved extending both of these pipelines to support audio extraction.

OTFE pipeline

The OTFE pipeline is designed for real-time operations. The existing flow was engineered for visual tasks. When a customer with Media Transformations enabled makes a request on their own website, it’s routed to our edge servers, which acts as the entry point. The request is then validated, and per the user’s request, OTFE would fetch the video and generate a resized version or a still-frame thumbnail.

In order to support audio-only extraction, I built upon our existing workflow to add a new mode. This involved:

Extending the validation logic: Specifically for audio, a crucial validation step was to verify that the source video contained an audio track before attempting extraction. This was in addition to pre-existing validation steps that ensure the requested URL was correctly formatted.
Building a new transformation handler: This was the core of my project. I built a new handler within the OTFE platform that specifically discarded the visual tracks in order to deliver a high-quality M4A file.

VOD pipeline

Similar to my work on OTFE, this project involved extending our current MP4 downloads workflow to audio-only, M4A downloads. This presented a series of interesting technical decisions.

The typical flow for creating a video download begins with a POST request to our main API layer, which handles authentication and validation, and creates a corresponding database record. Which then enqueues a job in our asynchronous queue where workers perform the processing task. To enable audio downloads for VOD, I introduced new, type-specific API endpoints (POST /downloads/{type}) while preserving the legacy POST /downloads route as an alias for creating downloads of the default, or video, download type. This ensured full backward compatibility.

The core work, of creating a download, is performed by our asynchronous queue. Which included:

Adding logic to the consumer to detect the new audio download type
Pulling the ffmpeg template we define in our API layer to properly encode the audio stream into a high-quality M4A container

By extending each component of this pipeline– from the API routes to the media processing commands– I was able to deliver a new, highly-requested feature that unlocks audio-centric workflows for our customers!

Dash screenshots

We’re excited to announce that this feature is also available in the Stream dashboard. Simply navigate to any of your videos, and you’ll find the option to download the video or just the audio.

Once the download is ready, you will see the URL for the file, along with the option to disable it.

That’s a wrap

This project addressed a long-standing customer need, providing a simpler way to work with audio from video. I’m truly grateful for this entire journey, from understanding the problem to shipping the solution, and especially for the mentorship and guidance I received from my team along the way. We are excited to see how developers use this new capability to build more efficient and exciting applications on Cloudflare Stream.

You can try the audio extraction feature by uploading a video to Stream or using the API! If you're interested in tackling these kinds of technical challenges yourself, explore our internship and early talent programs to start your own journey.

Building a better testing experience for Workflows, our durable execution engine for multi-step applications

Olga Silva — Tue, 04 Nov 2025 14:00:00 GMT

Cloudflare Workflows is our take on "Durable Execution." They provide a serverless engine, powered by the Cloudflare Developer Platform, for building long-running, multi-step applications that persist through failures. When Workflows became generally available earlier this year, they allowed developers to orchestrate complex processes that would be difficult or impossible to manage with traditional stateless functions. Workflows handle state, retries, and long waits, allowing you to focus on your business logic.

However, complex orchestrations require robust testing to be reliable. To date, testing Workflows was a black-box process. Although you could test if a Workflow instance reached completion through an await to its status, there was no visibility into the intermediate steps. This made debugging really difficult. Did the payment processing step succeed? Did the confirmation email step receive the correct data? You couldn't be sure without inspecting external systems or logs.

Why was this necessary?

As developers ourselves, we understand the need to ensure reliable code, and we heard your feedback loud and clear: the developer experience for testing Workflows needed to be better.

The black box nature of testing was one part of the problem. Beyond that, though, the limited testing offered came at a high cost. If you added a workflow to your project, even if you weren't testing the workflow directly, you were required to disable isolated storage because we couldn't guarantee isolation between tests. Isolated storage is a vitest-pool-workers feature to guarantee that each test runs in a clean, predictable environment, free from the side effects of other tests. Being forced to have it disabled meant that state could leak between tests, leading to flaky, unpredictable, and hard-to-debug failures.

This created a difficult choice for developers building complex applications. If your project used Workers, Durable Objects, and R2 alongside Workflows, you had to either abandon isolated testing for your entire project or skip testing. This friction resulted in a poor testing experience, which in turn discouraged the adoption of Workflows. Solving this wasn't just an improvement, it was a critical step in making Workflows part of any well-tested Cloudflare application.

Introducing isolated testing for Workflows

We're introducing a new set of APIs that enable comprehensive, granular, and isolated testing for your Workflows, all running locally and offline with vitest-pool-workers, our testing framework that supports running tests in the Workers runtime workerd. This enables fast, reliable, and cheap test runs that don't depend on a network connection.

They are available through the cloudflare:test module, with @cloudflare/vitest-pool-workers version 0.9.0 and above. The new test module provides two primary functions to introspect your Workflows:

introspectWorkflowInstance: useful for unit tests with known instance IDs
introspectWorkflow: useful for integration tests where IDs are typically generated dynamically.

Let's walk through a practical example.

A practical example: testing a blog moderation workflow

Imagine a simple Workflow for moderating a blog. When a user submits a comment, the Workflow requests a review from workers-ai. Based on the violation score returned, it then waits for a moderator to approve or deny the comment. If approved, it calls a step.do to publish the comment via an external API.

Testing this without our new APIs would be impossible. You'd have no direct way to simulate the step’s outcomes and simulate the moderator's approval. Now, you can mock everything.

Here’s the test code using introspectWorkflowInstance with a known instance ID:

import { env, introspectWorkflowInstance } from "cloudflare:test";

it("should mock a an ambiguous score, approve comment and complete", async () => {
   // CONFIG
   await using instance = await introspectWorkflowInstance(
       env.MODERATOR,
       "my-workflow-instance-id-123"
   );
   await instance.modify(async (m) => {
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 50 });
       await m.mockEvent({ 
           type: "moderation-approval", 
           payload: { action: "approved" },
       });
       await m.mockStepResult({ name: "publish comment" }, { status: "published" });
   });

   await env.MODERATOR.create({ id: "my-workflow-instance-id-123" });
   
   // ASSERTIONS
   expect(await instance.waitForStepResult({ name: "AI content scan" })).toEqual(
       { violationScore: 50 }
   );
   expect(
       await instance.waitForStepResult({ name: "publish comment" })
   ).toEqual({ status: "published" });

   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});

This test mocks the outcomes of steps that require external API calls, such as the 'AI content scan', which calls Workers AI, and the 'publish comment' step, which calls an external blog API.

If the instance ID is not known, because you are either making a worker request that starts one/multiple Workflow instances with random generated ids, you can call introspectWorkflow(env.MY_WORKFLOW). Here’s the test code for that scenario, where only one Workflow instance is created:

it("workflow mock a non-violation score and be successful", async () => {
   // CONFIG
   await using introspector = await introspectWorkflow(env.MODERATOR);
   await introspector.modifyAll(async (m) => {
       await m.disableSleeps();
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 0 });
   });

   await SELF.fetch(`https://mock-worker.local/moderate`);

   const instances = introspector.get();
   expect(instances.length).toBe(1);

   // ASSERTIONS
   const instance = instances[0];
   expect(await instance.waitForStepResult({ name: "AI content scan"  })).toEqual({ violationScore: 0 });
   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});

Notice how in both examples we’re calling the introspectors with await using - this is the Explicit Resource Management syntax from modern JavaScript. It is crucial here because when the introspector objects go out of scope at the end of the test, its disposal method is automatically called. This is how we ensure each test works with its own isolated storage.

The modify and modifyAll functions are the gateway to controlling instances. Inside its callback, you get access to a modifier object with methods to inject behavior such as mocking step outcomes, events and disabling sleeps.

You can find detailed documentation on the Workers Cloudflare Docs.

How we connected Vitest to the Workflows Engine

To understand the solution, you first need to understand the local architecture. When you run wrangler dev, your Workflows are powered by Miniflare, a simulator for testing Cloudflare Workers, and workerd. Each running workflow instance is backed by its own SQLite Durable Object, which we call the "Engine DO". This Engine DO is responsible for executing steps, persisting state, and managing the instance's lifecycle. It lives inside the local isolated Workers runtime.

Meanwhile, the Vitest test runner is a separate Node.js process living outside of workerd. This is why we have a Vitest custom pool that allows tests to run inside workerd called vitest-pool-workers. Vitest-pool-workers has a Runner Worker, which is a worker to run the tests with bindings to everything specified in the user wrangler.json file. This worker has access to the APIs under the “cloudflare:test” module. It communicates with Node.js through a special DO called Runner Object via WebSocket/RPC.

The first approach we considered was to use the test runner worker. In its current state, Runner worker has access to Workflow bindings from Workflows defined on the wrangler file. We considered also binding each Workflow's Engine DO namespace to this runner worker. This would give vitest-pool-workers direct access to the Engine DOs where it would be possible to directly call Engine methods.

While promising, this approach would have required undesirable changes to the core of Miniflare and vitest-pool-workers, making it too invasive for this single feature.

Firstly, we would have needed to add a new unsafe field to Miniflare's Durable Objects. Its sole purpose would be to specify the service name of our Engines, preventing Miniflare from applying its default user prefix which would otherwise prevent the Durable Objects from being found.

Secondly, vitest-pool-workers would have been forced to bind every Engine DO from the Workflows in the project to its runner, even those not being tested. This would introduce unwanted bindings into the test environment, requiring an additional cleanup to ensure they were not exposed to the user's tests env.

The breakthrough

The solution is a combination of privileged local-only APIs and Remote Procedure Calls (RPC).

First, we added a set of unsafe functions to the local implementation of the Workflows binding, functions that are not available in the production environment. They act as a controlled access point, accessible from the test environment, allowing the test runner to get a stub to a specific Engine DO by providing its instance ID.

Once the test runner has this stub, it uses RPC to call specific, trusted methods on the Engine DO via a special RpcTarget called WorkflowInstanceModifier. Any class that extends RpcTarget has its objects replaced by a stub. Calling a method on this stub, in turn, makes an RPC back to the original object.

This simpler approach is far less invasive because it's confined to the Workflows environment, which also ensures any future feature changes are safely isolated.

Introspecting Workflows with unknown IDs

When creating Workflows instances (either by create() or createBatch()) developers can provide a specific ID or have it automatically generated for them. This ID identifies the Workflow instance and is then used to create the associated Engine DO ID.

The logical starting point for implementation was introspectWorkflowInstance(binding, instanceID), as the instance ID is known in advance. This allows us to generate the Engine DO ID required to identify the engine associated with that Workflow instance.

But often, one part of your application (like an HTTP endpoint) will create a Workflow instance with a randomly generated ID. How can we introspect an instance when we don't know its ID until after it's created?

The answer was to use a powerful feature of JavaScript: Proxy objects.

When you use introspectWorkflow(binding), we wrap the Workflow binding in a Proxy. This proxy non-destructively intercepts all calls to the binding, specifically looking for .create() and .createBatch(). When your test triggers a workflow creation, the proxy inspects the call. It captures the instance ID — either one you provided or the random one generated — and immediately sets up the introspection on that ID, applying all the modifications you defined in the modifyAll call. The original creation call then proceeds as normal.

env[workflow] = new Proxy(env[workflow], {
  get(target, prop) {
    if (prop === "create") {
      return new Proxy(target.create, {
        async apply(_fn, _this, [opts = {}]) {

          // 1. Ensure an ID exists 
          const optsWithId = "id" in opts ? opts : { id: crypto.randomUUID(), ...opts };

          // 2. Apply test modifications before creation
          await introspectAndModifyInstance(optsWithId.id);

          // 3. Call the original 'create' method 
          return target.create(optsWithId);
        },
      });
    }

    // Same logic for createBatch()
  }
}

When the await using block from introspectWorkflow() finishes, or the dispose() method is called at the end of the test, the introspector is disposed of, and the proxy is removed, leaving the binding in its original state. It’s a low-impact approach that prioritizes developer experience and long-term maintainability.

Get started with testing Workflows

Ready to add tests to your Workflows? Here’s how to get started:

Update your dependencies: Make sure you are using @cloudflare/vitest-pool-workers version 0.9.0 or newer. Run the following command in your project: npm install @cloudflare/vitest-pool-workers@latest
Configure your test environment: If you're new to testing on Workers, follow our guide to write your first test.

Start writing tests: Import introspectWorkflowInstance or introspectWorkflow from cloudflare:test in your test files and use the patterns shown in this post to mock, control, and assert on your Workflow's behavior. Also check out the official API reference.

Help build the future: announcing Cloudflare’s goal to hire 1,111 interns in 2026

Kelly Russell — Mon, 22 Sep 2025 13:10:00 GMT

At Cloudflare, our mission is to help build a better Internet. That mission is ambitious, long-term, and requires constant innovation. But building for the future isn’t just about the technology we create — it’s also about investing in the people who will create it. That’s why today, we are incredibly excited to announce our most ambitious intern program yet: Cloudflare aims to hire as many as 1,111 interns over the course of 2026. This effort to grow our number of interns next year will happen in hub locations around the world.

Why is Cloudflare doing this?

We view internships as a vital pipeline for talent and a source of new energy and ideas. The number of our intern goal, a nod to our 1.1.1.1 public DNS resolver, is intentional. It represents our deep technical roots and our focus on building foundational infrastructure for the Internet. Now, we stand at the cusp of a new technological revolution: the age of AI.

To win in this new era, we can’t just rely on established methods. We need new ways of thinking, unconstrained by the "way things have always been done." That’s why this significantly increased class of interns will have a special focus: to ramp up the creative and widespread application of AI with a fresh approach.

We want this group to challenge our assumptions. They will be tasked with looking at our customers’ needs, our products and features, our network, and our operations to find novel ways to utilize AI. How can AI make our network even smarter? How can it help our customers be more secure and efficient? How can it transform our own business processes? We believe that by empowering a large, diverse cohort of bright minds who have grown up as digital and now AI natives, we will unlock innovations we haven’t even imagined yet.

This is the exact right time to expand our intern program

Like you, we have seen numerous reports that more and more firms are capping their total headcount in favor of leaning on more AI tools, leading to downsizing their intern and new-graduate hiring. This is resulting in increased sidelining of new college graduates. But we think this misreads the moment completely, so we’re heading in the opposite direction.

While we are excited about what AI tools can help do, we have a different philosophy about their role. AI tools make great team members even better, and allow firms to set more ambitious goals. They are not replacements for new hires — but ways to multiply how new hires can contribute to a team.

The next phase of Cloudflare’s success will be driven by considerable change in almost everything we do. And although we have an amazing team, we are humble enough to realize that we don’t possess everything we need to envision and implement that radical change. We need the innovation and fresh approach of a talented new generation of leaders. And we can’t press “pause” on bringing aboard that talent.

This isn’t the first time we’ve made a counter-cultural commitment to interns. Back in the 2020, as the world faced unprecedented uncertainty, many companies made the difficult decision to scale back or cancel their internship programs. We went in the opposite direction. Believing that investing in talent was more critical than ever, we doubled the size of our intern class. We knew that these students represented the future, and abandoning them was not an option. That decision reinforced our culture of long-term thinking and our responsibility to foster emerging talent, especially during the toughest of times. And we’ve benefitted from it — some of our most promising young employees emerged from this batch.

Interns ship at Cloudflare

Interns at Cloudflare do real, meaningful work — they ship. They join active teams, and are expected to contribute to the problems that we solve everyday. Our interns don’t merely get a feel for the place and fetch coffee. This isn’t a “test drive.” We want every member of our intern program to take ownership of and conclude their time being able to point to a concrete deliverable that solved a real customer or internal problem at Cloudflare.

From day one, interns are embedded in teams across the company — from engineering and product to marketing, legal, and finance. They work alongside seasoned experts on critical projects, contributing code that ships to millions, launching marketing campaigns, and helping to shape the policies that govern the Internet. Our goal is not just to provide an internship experience; it's to provide the foundation for a career. We are committed to training the people who will one day lead our company, our industry, and the future of the Internet.

The challenges we address will vary by intern and by team. You can review examples of intern projects from last year in this post here and real, dedicated, announcements from interns who launched new technologies here and here. Some of our interns operate as if they were just one more engineer or staff member on an existing team, helping contribute to its mission. Others are tasked with more exploratory projects where we ask them to go research and prototype new ideas.

Aside from impactful project work, our internship program offers a deep dive into our culture, while providing interns with practical experience and leadership skills. They'll build a valuable professional network, from engaging in social events and coffee chats to gaining direct access to executives through exclusive Q&A sessions. Every intern is paired with a dedicated mentor, and they'll get the chance to present their final work to the entire company. By the end of the program, interns will not only have enhanced their skills but also built lasting relationships to benefit their future careers.

What do we look for in an intern?

We are looking for talented, curious, empathetic, and hard working team members who are inspired by our mission to help build a better Internet. Come with the attitude to learn, and we will handle the rest. We do not expect interns to be immediate experts in the fields they are joining. The Internet is full of enough jokes about companies posting a job for an internship and asking for ten years of work experience.

We do try to match opportunities with the applicant’s study areas and relevant skills. We want to equip our interns for success and prefer, for example, finding software engineering opportunities for computer science students or accounting opportunities for finance majors. Each internship role posted will specify any specific preferences we have for areas of study. We recognize that many students have robust portfolios, GitHub projects, or open-source contributions. We'll optimize our matching process to connect you with a relevant team where you can immediately apply your skills and elevate your work.

Thousands of candidates apply for our internships each year. We expect this expansion to increase that level of interest significantly. To help identify the kinds of builders we want to recruit, we are going to fast track engineering and other candidates who complete an assignment to build a type of AI-powered application on Cloudflare (more details on that below).

How does the internship program work?

Working in Hub Offices

As part of this program, we will only be hiring interns who can be present multiple days each week in one of our hub offices (generally 3-4 days depending on the team). Cloudflare has adopted a hybrid approach to work centered in “hub” locations around the world. The various hybrid approaches adopted by different teams are based on experimentation and their unique functions. For interns, we think it is important for new and early career team members here for a brief tenure to connect with each other as well as more senior leaders in our organization. We believe that mentorship and coaching is best done in person.

We expect to post internship opportunities in the following Cloudflare office locations:

Austin, USA
New York City, USA
San Francisco, USA
Bengaluru, India
Lisbon, Portugal
London, UK

Year Round

Our internships generally last for 12 weeks. While we plan to prioritize summer internships, we expect to hire significant numbers of interns in the spring and fall of 2026 as well.

Summer internships give students an opportunity to get experience without interrupting a school semester. The seasonal approach also makes it possible for us to create cohorts of interns who support each other on projects. That said, we know that education has changed a bit since we were in school. An increasing number of universities have developed programs for students to work with companies as part of a normal school semester, and others are more flexible in their approach to letting students choose to reduce hours or take a semester away from classwork to support an internship.

Real pay for real work

We pay our interns. This means a competitive rate that is generally akin to the prorated salary of an entry-level position. And if you have to relocate temporarily to a city where we have an office, we will give you a stipend to support your travel and housing needs. Since we expect interns at Cloudflare to contribute immediately to real problem solving, it’s only fair to pay them accordingly.

And we believe it's incredibly important to pay interns. Many long-term employment opportunities arise through internship programs, so it's unfair to limit those programs to those who can afford to relocate and work full time for little or no pay.

How to apply

Keep an eye on our career site, and specifically our internship opportunities listed here. We will start posting more internship opportunities for 2026 starting on October 15th.
The intern opportunities page will link to our internship application portal that will streamline the application process. We plan to review applications in batches until all positions are filled. Our interview process will take 3-4 weeks.
Want a leg up? For the Software Engineering internship, we plan to fast track review of candidates who complete an assignment to build a type of AI-powered application on Cloudflare. Submit directly with your application.

We look forward to hearing from you.

Summer 2024 weather report: Cloudflare with a chance of Intern-ets

Aaron Snell — Mon, 19 Aug 2024 21:00:00 GMT

During the summer of 2024, Cloudflare welcomed approximately 60 Intern-ets from all around the globe on a mission to #HelpBuildABetterInternet. Over the course of their internships, our wonderful interns tackled real-world challenges from different teams all over the company and contributed to cutting-edge projects. As returning interns, we – Shaheen, Aaron, and Jada – would like to show off the great work our cohort has done and experiences we’ve had throughout our time here.

_{Austin Interns after volunteering at the Central Texas Food Bank.}

Putting the SHIP in internSHIP

Cloudflare interns take pride in driving high-impact initiatives, playing a vital role in advancing Cloudflare's mission. With our diverse roles and projects this summer, we'd love to highlight some of the exciting work we've been involved in:

Rahul, a Software Engineer intern, built a system to autograde intern application assignments for future students looking to join Cloudflare. It was built entirely on the Cloudflare Developer Platform, using Cloudflare Access, Browser Rendering, D1, Durable Objects, R2, and Workers!

Jessica, a Software Engineer intern, created a new threads api for the Workers AI team that automatically recalls past messages when running inference, helping developers to generate chat sessions when building personalized chatbots.

Anshika, an Internal Audit intern, worked on SOX & ISO testing for the first quarter of 2024, a data center audit, and is helping to roll out the automation of SOC 2 testing.

Jake, a Business Development Relations intern, led the creation and launch of an outbound BDR campaign that generated new pipeline for Cloudflare involving tailored messaging, account criteria, and enablement materials.

Utkarsh, a Software Engineer intern, built an internal tool for the Capacity Planning team to simulate unforeseen scenarios and changes within the Cloudflare network infrastructure, to help them provision new servers more efficiently.

Shaheen, a Product Management intern, and Anantharaman, a Software Engineer Intern, collaboratively enhanced Cloudflare D1 with improvements in billing observability, the dashboard, and Wrangler commands while also launching billing alerts and audit logs.

Jada, a Software Engineer intern, developed a Policy Tester feature for the Zero Trust Access Policies page to enable Cloudflare customers to view policy update statistics, using Cloudflare Durable Objects for the RESTful API.

Dhravya, a Software Engineer intern, helped launch function calling in Workers AI, a feature that enables LLMs to dynamically perform actions or retrieve data.

Prajjwal, a Research intern, focused on personalizing the WAF attack score and enhancing the overall user experience while experimenting with zero-shot learning techniques to detect new and evolving attacks.

Megan, a Security Analytics intern, ensured a smooth transition to an in-house security access tool by managing internal user access, aligning group rules, and addressing missing or duplicated groups/users during system migrations.

Intern events

In-person

For the first time since the pandemic, Cloudflare had over half of our interns in-person in our Austin, Lisbon and London offices. As the saying goes, everything is bigger in Texas and our new Austin office intern class shows it, with over 20 interns working in-person throughout the summer! Some of our favorite events were…

Intern ping-pong tournament After weeks of ping-pong classes, our interns got to put their skills to the test with a ping-pong tournament! The competition was fierce, with paddles flying and cheers echoing throughout the office as everyone battled for the title of Ping-Pong Champion. In the end, Josh, a PM intern on Workers, stole the show and was crowned as the 2024 Ping-Pong Champion.

_{Interns in the Austin office participating in the Ping-Pong tournament.}

Food Bank Volunteering The Austin interns got the opportunity to give back to the community this summer at the Central Texas Food Bank. With the help of our full-timers the team got to work instantly, from packing food boxes to helping organize donations. Thanks to all the hard work, the team was able to feed a total of 9,000 people!

_{Austin Interns volunteering at the Central Texas Food Bank.}

Austin Game Show The Austin interns had the exciting opportunity to participate in an exciting game show with their wonderful recruiters. This competitive showdown saw the “Winter-nets” face off against the “Return-ets” in a series of fun and challenging activities, including Family Feud and Wheel of Fortune. In the end, the Return-ets took the victory, with perhaps one too many pictures to prove it. Events like this are among the best, fostering bonding and friendly competition that bring everyone closer together.

_{“Return-ets” posing with the trophy they won.}

Our Lisbon office also had 10 interns join Cloudflare and spend time doing some awesome things, including…

Visiting the Lisbon Data Center One of the most impressive parts of Cloudflare is its infrastructure, which spans hundreds of data centers worldwide. This summer, our interns were able to get up close and personal with our Lisbon data center where they ventured through Cloudflare’s state-of-the-art server rooms. They witnessed the security measures that are in place to ensure the safety of our data and the support systems that ensure that the facility is able to run nonstop.

_{Interns and Cloudflarians taking a photo in the Lisbon data center.}

Remote

Our remote interns got to take on some virtual adventures throughout the summer, building friendships and memories from all around the globe! Some of the highlights were…

Snack time

Interns all over the globe were shipped a box filled with an assortment of Japanese snacks and got to spend time together snacking on unique and new foods. This event not only satisfied our snack cravings but also strengthened our global connections with a fun, shared experience.

_{Remote interns showing their favorite snacks from the snack boxes we received.}

Mingling meetups

Throughout the summer, interns got the chance to mingle with one another as well as the rest of the company. For a break from project work and a time to socialize, interns looked forward to Virtual Intern Game Days and Gatheround meetups. These designated online hangout blocks made for a more fun and inclusive experience for the remote interns. Along with that, remote interns near Cloudflare offices were welcome to join in-person events throughout the summer: from team lunches to arts & crafts and social mixers, visiting the office is always worth the trip!

_{Group photo of a Gatherround session with the interns and Cloudfriends.}

Executive chats

Executive chats have been a key part of ensuring our interns get to truly know our top leaders and ask them questions 1-on-1. This summer was no exception, with interns hosting over eight executive chats filled with inspiring stories, valuable knowledge, and meaningful connections. Here’s what our interns had to say about it…

Sasha enjoyed talking with Matthew Prince, Chief Executive Officer: “It was so heartfelt and emotional. I had heard the story of Lee Holloway before, but hearing it from Matthew himself was really impactful.”

Chantal loved the talk with Michele Yetman, Chief People Officer: “I enjoyed hearing about her job history and how she carefully adapted skill sets from her previous jobs to craft her career. Also, she was curious to hear our perspective and answered our questions in honest detail.”

Anantharaman liked the talk with Dane Knecht, SVP of Emerging Technology and Innovation: “Learning about the growth of ETI, the Austin office and the mission to move fast and break things within ETI to nurture a startup-y culture was very interesting.”

Matilde valued the talk with Nitin Rao, Interim Chief Product Officer: “I didn't know much about him beforehand, but I found it fascinating to learn about his role at the company and the great impact of his contributions to Cloudflare's current infrastructure.”

Company-wide intern presentations

Each summer, Cloudflare Intern-ets have the opportunity to showcase their work during the annual Intern Presentations series. Hundreds of Cloudflarians join in to support and celebrate the interns including Matthew Prince and Michelle Zatlyn. As Jake, a BDR Operations Intern, puts it, “The opportunity to present in front of the founders and the rest of the company speaks volumes about how much Cloudflare values its interns' projects.”

Cloudfriends

Cloudfriends is a program specifically for interns to socialize with people throughout Cloudflare. Cloudflare employees from various departments signed up, and we were able to schedule meetings with all of them. These meetings let us get to know more people, share experiences, and keep in touch (even after our internships have ended). On a similar note, Cloudflare has a program for Random Employee Chats that interns can also take part in. These chats randomly pair you with another Cloudflare employee once a week and allow you to do even more socializing.

Unforgettable memories

Throughout our time here at Cloudflare, we formed numerous unforgettable memories that truly made our internship experience one of a kind.

The people

Cloudflare is filled with the most driven, passionate, and all around amazing people, so it’s no wonder that we all had a spectacular time interacting with everyone!

Ananya took networking to the next level by engaging with 49 people throughout her internship, forging valuable connections across the company. Meanwhile, Yomna wasted no time setting up 25 1:1s by the end of her second week, meeting a bunch of awesome people along the way.

Sasha, Shaheen, Jack, Jake, Jaden, and Josh hit up a local Austin restaurant and spent SIX hours bonding over laughs, talks, and good vibes. From strangers to friends, this was a moment that will last well beyond their internships.

Carol and Jessica enjoyed their teams’ on-site events in the Austin office. They were able to meet all of their team members face-to-face and work and have fun together. The numerous on-site events that took place over the summer let the Austin Intern-ets connect with people from all over the company, including other Intern-ets that were remote or working out of another office.

Dhanush and Utkarsh gathered the interns to enjoy the Olympics in the Austin office. They all sat in the same area, talked with each other, and while watching the intense competition.

The activities

Aaron, Carol, and Tara enjoyed all the game nights the R2 team hosted. They played a variety of board and card games from For Sale to Brass: Birmingham that everyone enjoyed, and they even did some late-night karaoke. Anshika also enjoyed the games that the Internal Audit team played before their all-hands meetings, such as skribbl.io. These events let everyone on the team be more connected and overall just have fun.

The Austin interns filled their time with loads of different activities. From a dinner with Matthew, to visiting Barton Springs, to hosting a 4th of July barbecue party on top of a high-rise and overlooking Austin’s skyline, they certainly didn’t miss an opportunity to get out and have fun.

Meanwhile, in Europe, the London interns enjoyed their lunches on the beach. They were able to talk, swim, and get to know each other. The Lisbon office also hosted a bunch of team lunches that allowed everyone to work together and enjoy the sun.

Summer shenanigans

Dhravya's caffeine fix: Dhravya drank around 250 cups of coffee over the course of his 12-week internship.

Matilde the bus marathoner: Every week, Matilde traveled 740 km from Braga to the Lisbon office by bus. By the end of her internship, she had accumulated 108 hours and 8880 km.

Anantharaman's epic commute: Anantharaman chose to turn his daily commute into a marathon, walking over 100 miles during his 14-week internship.

Jack the Linux legend: Jack was the second most active Linux-based developer for dash.cloudflare.com.

Josh the Ping-Pong prodigy: Josh started his internship as a ping-pong newbie but after daily break-time practice, he smashed his way to win the intern ping-pong tournament.

Wing night warriors: The Austin interns PROUDLY placed 24th in the Pluckers Wing night trivia.

Byte-sized intern advice

Wondering how to get the most out of your time as an intern? We surveyed the Intern-ets for some insider hints…

Be curious and don’t be afraid to ask questions: An overwhelming number of Intern-ets and executives alike emphasized the importance of staying curious and keeping an open mind. Aside from that, asking questions can make a huge difference to get unstuck or even think ahead on problems; As Nikhil, a SWE Intern, points out, “Cloudflare is vast, and people are super friendly + eager to help.”

The importance of introductions: PM Intern Yomna notes how “cross-functional work…always works best when you are able to establish proper connections, a unified voice, and an open space that is dedicated to tackling the problem/situation at hand” and BDR Operations Intern Blaise recommends that interns “...make meetings with everyone on your team just to understand what each person is working on, and where you may be able to slot in”

Data on data: Marketing Analytics Intern Tanuj provides some insight, highlighting how “In analytics, it's crucial to be data-informed rather than just data-driven. For instance, data might suggest cutting a high-spend marketing campaign due to low short-term ROI. However, understanding the business context – such as the campaign's role in building brand loyalty can reveal its long-term value. Always consider the broader picture for more impactful insights.”

Want to become an Intern-et or Cloudflarian?

Sign up here to be notified of new graduate and internship opportunities for 2025. Cloudflare is also hiring for full-time opportunities: check out open positions and apply today!

Introducing high-definition portrait video support for Cloudflare Stream

Sasha Huang — Fri, 16 Aug 2024 14:00:00 GMT

Cloudflare Stream is an end-to-end solution for video encoding, storage, delivery, and playback. Our focus has always been on simplifying all aspects of video for developers. This goal continues to motivate us as we introduce first-class portrait (vertical) video support today. Newly uploaded or ingested portrait videos will now automatically be processed in full HD quality.

Why portrait video

In the past few years, the popularity of portrait video has exploded, motivated by short-form video content applications such as TikTok or YouTube Shorts. However, Cloudflare customers have been confused as to why their portrait videos appear to be lower quality when viewed on portrait-first devices such as smartphones. This is because our video encoding pipeline previously did not support high-quality portrait videos, leading them to be grainy and lower quality. This pain point has now been addressed with the introduction of high-definition portrait video.

The current stream pipeline

When you upload a video to Stream, it is first encoded into several different “renditions” (sizes or resolutions) before delivery. This is done in order to enable playback in a wide variety of network conditions, as well as to standardize the way a video is experienced. By using these adaptive bitrate renditions, we are able to offer viewers the highest quality streaming experience which fits their network bandwidth, meaning someone watching a video with a slow mobile connection would be served a 240p video (a resolution of 320x240 pixels) and receive the 1080p (a resolution of 1920x1080 pixels) version when they are watching at home on their fiber Internet connection. This encoding pipeline follows one of two different paths:

The first path is our video on-demand (VOD) encoding pipeline, which generates and stores a set of encoded video segments at each of our standard video resolutions. The other path is our on-the-fly encoding (OTFE) pipeline, which uses the same process as Stream Live to generate resolutions upon user request. Both pipelines work with the set of standard resolutions, which are identified through a constrained target (output) height. This means that we encode every rendition to heights of 240 pixels, 360 pixels, etc. up to 1080 pixels.

When originally conceived, this encoding pipeline was not designed with portrait video in mind because portrait video was less common. As a result, portrait videos were encoded with lower quality dimensions that were consistent with landscape video encoding. For example, a portrait HD video would have the dimensions 1920x1080 — scaling that down to the height of a landscape HD video would result in the much smaller output of 1080x606. However, current smartphones all have HD displays, making the discrepancy clear when a portrait video is viewed in portrait mode on a phone. With this new change to our encoding pipeline, all newly uploaded portrait videos will now be automatically encoded with constrained target width, using a new set of standard resolutions for portrait video. These resolutions are the same as the current set of landscape resolutions, but with the dimensions reversed: 240x426 up to 1080x1920.

Technical details

As the Stream intern this summer, I was tasked with this project, as well as the expectation of shipping a long-requested change, by the end of my internship. The first step in implementing this change was to familiarize myself with the complex architecture of Stream’s internal systems. After this, I began brainstorming a few different implementation decisions, like how to consistently track orientation through various stages of the pipeline. Following a group discussion to decide which choices would be the most scalable, least complex, and best for users, it was time to write the technical specification.

Due to the implementation method we chose, making this change involved tracing the life of a video from upload to delivery through both of our encoding pipelines and applying different logic for portrait videos. Previously, all video renditions were identified by their height at each stage of the pipeline, making certain parts of the pipeline completely agnostic to the orientation of a video. With the proposed changes, we would now be using the constraining dimension and orientation to identify a video rendition. Therefore, much of the work involved modifying the different portions of the pipeline to use these new parameters.

The first area of the pipeline to be modified was the Stream API service, which is the process which handles all Stream API calls. The API service enqueues the rendition encoding jobs for a video after it is uploaded, so it was necessary to introduce a new set of renditions designed for portrait videos, and enqueue the corresponding encoding jobs. The queueing system is handled by our in-house queue management system, which handles jobs generically and therefore did not require any changes.

Following this, I tackled the on-the-fly encoding pipeline. The area of interest here was the delivery portion of our pipeline, which generated the set of encoding resolutions to pass on to our on-the-fly encoder. Here I also introduced a new set of portrait renditions and the corresponding logic to encode them for portrait videos. This part of the backend is written and hosted on Cloudflare Workers, which made it very easy and quick to deploy and test changes.

Finally, we wanted to change how we presented these quality levels to users in the Stream built-in player and thought that using the new constrained dimension rather than always showing the height would feel more familiar. For portrait videos, we now display the size of the constraining dimension, which also means quality selection for portrait videos encoded under our old system now more accurately reflects their quality, too. As an example, a 9:16 portrait video would have been encoded to a maximum size of 608x1080 by the previous pipeline. Now, such a rendition will be marked as 608p rather than the full-quality 1080p, which would be a 1080x1920 rendition.

Stream as a whole is built on many of our own Developer Platform products, such as Workers for handling delivery, R2 for rendition storage, Workers AI for automatic captioning, and Durable Objects for bitrate observation, all of which enhance our ability to deploy and ship new updates quickly. Throughout my work on this project, I was able to see all of these pieces in action, as well as gain a new understanding of the powerful tools Cloudflare offers for developers.

Results and findings

After the change, portrait videos are now encoded to higher resolutions and visibly appear to be higher quality. To confirm these differences, I analyzed the effect of the pipeline change on four different sample videos using the peak-signal-to-noise ratio (PSNR, a mathematical representation of image quality). Since the old pipeline produced lower resolution videos, the comparison here is between an upscaled version of the old pipeline rendition and the current pipeline rendition. In the graph below, higher values reflect higher quality relative to the unencoded original video.

According to this metric, we see an increase in quality from the pipeline changes as high as 8%. However, the quality increase is most noticeable to the human eye in videos that feature fine details or a high amount of movement, which is not always captured in the PSNR. For example, compare a side-by-side of a frame from the book sample video encoded both ways:

The difference between the old and new encodings is most clear when zoomed in:

Here’s another example (sourced from Mixkit):

Magnified:

Of course, watching these example clips is the clearest way to see:

Book intro: before and after
Hair and makeup: before and after

Maximize the Stream player and look at the quality selector (in the gear menu) to see the new quality level labels – select the highest option to compare. Note the improved sharpness of the text in the book sample as well as the improved detail in the hair and eye shadow of the hair and makeup sample.

Implementation challenges

Due to the complex nature of our encoding pipelines, there were several technical challenges to making a large change like this. Aside from simply uploading videos, many of the features we offer, like downloads or clipping, require tweaking to produce the correct video renditions. This involved modifying many parts of the encoding pipeline to ensure that portrait video logic was handled.

There were also some edge cases which were not caught until after release. One release of this feature contained a bug in the on-the-fly encoding logic which caused a subset of new portrait livestream renditions to have negative bitrates, making them unusable. This was due to an internal representation of video renditions’ constraining dimensions being mistakenly used for bitrate observation. We remedied this by increasing the scope of our end-to-end testing to include more portrait video tests and live recording interaction tests.

Another small bug caused downloading very small videos to sometimes fail. This was because for videos under 240p, our smallest encoding resolution, the non-constraining dimension was not being properly scaled to match the aspect ratio of the video, causing some specific combinations of dimensions to be interpreted as portrait when they should have been landscape, and vice versa. This bug was fixed quickly, but was not initially caught after the release since it required a very specific set of conditions to activate. After this experience, I added a few more unit tests involving small videos.

That’s a wrap

As my internship comes to a close, reflecting on the experience makes me grateful for all the team members who have helped me throughout this time. I am very glad to have shipped this project which addresses a long-standing concern and will have real-world customer impact. Support for high-definition portrait video is now available, and we will continue to make improvements to our video solutions suite. You can see the difference yourself by uploading a portrait video to Stream! Or, perhaps you’d like to help build a better Internet, too — our internship and early talent programs are a great way to jumpstart your own journey.

Sample video acknowledgements: The sample video of the book was created by the Stream Product Manager and shows the opening page of The Strange Wonder of Roots by Evan Griffith (HarperCollins). The hair and makeup fashion video was sourced from Mixkit, a great source of free media for video projects.

Embedded function calling in Workers AI: easier, smarter, faster

Harley Turan — Thu, 27 Jun 2024 17:00:09 GMT

Introducing embedded function calling and a new ai-utils package

Today, we’re excited to announce a novel way to do function calling that co-locates LLM inference with function execution, and a new ai-utils package that upgrades the developer experience for function calling.

This is a follow-up to our mid-June announcement for traditional function calling, which allows you to leverage a Large Language Model (LLM) to intelligently generate structured outputs and pass them to an API call. Function calling has been largely adopted and standardized in the industry as a way for AI models to help perform actions on behalf of a user.

Our goal is to make building with AI as easy as possible, which is why we’re introducing a new @cloudflare/ai-utils npm package that allows developers to get started quickly with embedded function calling. These helper tools drastically simplify your workflow by actually executing your function code and dynamically generating tools from OpenAPI specs. We’ve also open-sourced our ai-utils package, which you can find on GitHub. With both embedded function calling and our ai-utils, you’re one step closer to creating intelligent AI agents, and from there, the possibilities are endless.

Why Cloudflare’s AI platform?

OpenAI has been the gold standard when it comes to having performant model inference and a great developer experience. However, they mostly support their closed-source models, while we want to also promote the open-source ecosystem of models. One of our goals with Workers AI is to match the developer experience you might get from OpenAI, but with open-source models.

There are other open-source inference providers out there like Azure or Bedrock, but most of them are focused on serving inference and the underlying infrastructure, rather than being a developer toolkit. While there are external libraries and frameworks like AI SDK that help developers build quickly with simple abstractions, they rely on upstream providers to do the actual inference. With Workers AI, it’s the best of both worlds – we offer open-source model inference and a killer developer experience out of the box.

With the release of embedded function calling and ai-utils today, we’ve advanced how we do inference for function calling and improved the developer experience by making it dead simple for developers to start building AI experiences.

How does traditional function calling work?

Traditional LLM function calling allows customers to specify a set of function names and required arguments along with a prompt when running inference on an LLM. The LLM returns the names and arguments for the functions that the customer can then make to perform actions. These actions give LLMs the ability to do things like fetch fresh data not present in the training dataset and "perform actions" based on user intent.

Traditional function calling requires multiple back-and-forth requests passing through the network in order to get to the final output. This includes requests to your origin server, an inference provider, and external APIs. As a developer, you have to orchestrate all the back-and-forths and handle all the requests and responses. If you were building complex agents with multi-tool calls or recursive tool calls, it gets infinitely harder. Fortunately, this doesn’t have to be the case, and we’ve solved it for you.

Embedded function calling

With Workers AI, our inference runtime is the Workers platform, and the Workers platform can be seen as a global compute network of distributed functions (RPCs). With this model, we can run inference using Workers AI, and supply not only the function names and arguments, but also the runtime function code to be executed. Rather than performing multiple round-trips across networks, the LLM inference and function can run in the same execution environment, cutting out all the unnecessary requests.

Cloudflare is one of the few inference providers that is able to do this because we offer more than just inference – our developer platform has compute, storage, inference, and more, all within the same Workers runtime.

We made it easy for you with a new ai-utils package

And to make it as simple as possible, we created a @cloudflare/ai-utils package that you can use to get started. These powerful abstractions cut down on the logic you have to implement to do function calling – it just works out of the box.

runWithTools

runWithTools is our method that you use to do embedded function calling. You pass in your AI binding (env.AI), model, prompt messages, and tools. The tools array includes the description of the function, similar to traditional function calling, but you also pass in the function code that needs to be executed. This method makes the inference calls and executes the function code in one single step. runWithTools is also able to handle multiple function calls, recursive tool calls, validation for model responses, streaming for the final response, and other features.

Another feature to call out is a helper method called autoTrimTools that automatically selects the relevant tools and trims the tools array based on the names and descriptions. We do this by adding an initial LLM inference call to intelligently trim the tools array before the actual function-calling inference call is made. We found that autoTrimTools helped decrease the number of total tokens used in the entire process (especially when there’s a large number of tools provided) because there’s significantly fewer input tokens used when generating the arguments list. You can choose to use autoTrimTools by setting it as a parameter in the runWithTools method.

const response = await runWithTools(env.AI,"@hf/nousresearch/hermes-2-pro-mistral-7b",
  {
    messages: [{ role: "user", content: "What's the weather in Austin, Texas?"}],
    tools: [
      {
        name: "getWeather",
        description: "Return the weather for a latitude and longitude",
        parameters: {
          type: "object",
          properties: {
            latitude: {
              type: "string",
              description: "The latitude for the given location"
            },
            longitude: {
              type: "string",
              description: "The longitude for the given location"
            }
          },
          required: ["latitude", "longitude"]
        },
	 // function code to be executed after tool call
        function: async ({ latitude, longitude }) => {
          const url = `https://api.weatherapi.com/v1/current.json?key=${env.WEATHERAPI_TOKEN}&q=${latitude},${longitude}`
          const res = await fetch(url).then((res) => res.json())

          return JSON.stringify(res)
        }
      }
    ]
  },
  {
    streamFinalResponse: true,
    maxRecursiveToolRuns: 5,
    trimFunction: autoTrimTools,
    verbose: true,
    strictValidation: true
  }
)

createToolsFromOpenAPISpec

For many use cases, users will need to make a request to an external API call during function calling to get the output needed. Instead of having to hardcode the exact API endpoints in your tools array, we made a helper function that takes in an OpenAPI spec and dynamically generates the corresponding tool schemas and API endpoints you’ll need for the function call. You call createToolsFromOpenAPISpec from within runWithTools and it’ll dynamically populate everything for you.

const response = await runWithTools(env.AI, "@hf/nousresearch/hermes-2-pro-mistral-7b", {
  messages: [{ role: "user",content: "Can you name me 5 repos created by Cloudflare"}],
  tools: [
    ...(await createToolsFromOpenAPISpec(  "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions-next/api.github.com/api.github.com.json"
    ))
  ]
})

Putting it all together

When you make a function calling inference request with runWithTools and createToolsFromOpenAPISpec, the only thing you need is the prompts – the rest is automatically handled. The LLM will choose the correct tool based on the prompt, the runtime will execute the function needed, and you’ll get a fast, intelligent response from the model. By leveraging our Workers runtime’s bindings and RPC calls along with our global network, we can execute everything from a single location close to the user, enabling developers to easily write complex agentic chains with fewer lines of code.

We’re super excited to help people build intelligent AI systems with our new embedded function calling and powerful tools. Check out our developer docs on how to get started, and let us know what you think on Discord.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Suleman Ahmad — Mon, 04 Sep 2023 13:00:51 GMT

This blog reports and summarizes the contents of a Cloudflare research paper which appeared at the ACM Internet Measurement Conference, that measures and prototypes connection coalescing with ORIGIN Frames.

Some readers might be surprised to hear that a single visit to a web page can cause a browser to make tens, sometimes even hundreds, of web connections. Take this very blog as an example. If it is your first visit to the Cloudflare blog, or it has been a while since your last visit, your browser will make multiple connections to render the page. The browser will make DNS queries to find IP addresses corresponding to blog.cloudflare.com and then subsequent requests to retrieve any necessary subresources on the web page needed to successfully render the complete page. How many? Looking below, at the time of writing, there are 32 different hostnames used to load the Cloudflare Blog. That means 32 DNS queries and at least 32 TCP (or QUIC) connections, unless the client is able to reuse (or coalesce) some of those connections.

Each new web connection not only introduces additional load on a server's processing capabilities – potentially leading to scalability challenges during peak usage hours – but also exposes client metadata to the network, such as the plaintext hostnames being accessed by an individual. Such meta information can potentially reveal a user’s online activities and browsing behaviors to on-path network adversaries and eavesdroppers!

In this blog we’re going to take a closer look at “connection coalescing”. Since our initial look at IP-based coalescing in 2021, we have done further large-scale measurements and modeling across the Internet, to understand and predict if and where coalescing would work best. Since IP coalescing is difficult to manage at large scale, last year we implemented and experimented with a promising standard called the HTTP/2 ORIGIN Frame extension that we leveraged to coalesce connections to our edge without worrying about managing IP addresses.

All told, there are opportunities being missed by many large providers. We hope that this blog (and our publication at ACM IMC 2022 with full details) offers a first step that helps servers and clients take advantage of the ORIGIN Frame standard.

Setting the stage

At a high level, as a user navigates the web, the browser renders web pages by retrieving dependent subresources to construct the complete web page. This process bears a striking resemblance to the way physical products are assembled in a factory. In this sense, a modern web page can be considered like an assembly plant. It relies on a ‘supply chain’ of resources that are needed to produce the final product.

An assembly plant in the physical world can place a single order for different parts and get a single shipment from the supplier (similar to the kitting process for maximizing value and minimizing response time); no matter the manufacturer of those parts or where they are made -- one ‘connection’ to the supplier is all that is needed. Any single truck from a supplier to an assembly plant can be filled with parts from multiple manufacturers.

The design of the web causes browsers to typically do the opposite in nature. To retrieve the images, JavaScript, and other resources on a web page (the parts), web clients (assembly plants) have to make at least one connection to every hostname (the manufacturers) defined in the HTML that is returned by the server (the supplier). It makes no difference if the connections to those hostnames go to the same server or not, for example they could go to a reverse proxy like Cloudflare. For each manufacturer a ‘new’ truck would be needed to transfer the materials to the assembly plant from the same supplier, or more formally, a new connection would need to be made to request a subresource from a hostname on the same web page.

Without connection coalescing

The number of connections used to load a web page can be surprisingly high. It is also common for the subresources to need yet other sub-subresources, and so new connections emerge as a result of earlier ones. Remember, too, that HTTP connections to hostnames are often preceded by DNS queries! Connection coalescing allows us to use fewer connections_, or ‘reuse’ the same set of trucks to carry parts from multiple manufacturers from a single supplier._

With connection coalescing

Connection coalescing in principle

Connection coalescing was introduced in HTTP/2, and carried over into HTTP/3. We’ve blogged about connection coalescing previously (for a detailed primer we encourage going over that blog). While the idea is simple, implementing it can present a number of engineering challenges. For example, recall from above that there are 32 hostnames (at the time of writing) to load the web page you are reading right now. Among the 32 hostnames are 16 unique domains (defined as “Effective TLD+1”). Can we create fewer connections or ‘coalesce’ existing connections for each unique domain? The answer is ‘Yes, but it depends’.

The exact number of connections to load the blog page is not at all obvious, and hard to know. There may be 32 hostnames attached to 16 domains but, counter-intuitively, this does not mean the answer to “how many unique connections?” is 16. The true answer could be as few as one connection if all the hostnames are reachable at a single server; or as many as 32 independent connections if a different and distinct server is needed to access each individual hostname.

Connection reuse comes in many forms, so it’s important to define “connection coalescing” in the HTTP space. For example, the reuse of an existing TCP or TLS connection to a hostname to make multiple requests for subresources from that same hostname is connection reuse, but not coalescing.

Coalescing occurs when an existing TLS channel for some hostname can be repurposed or used for connecting to a different hostname. For example, upon visiting blog.cloudflare.com, the HTML points to subresources at cdnjs.cloudflare.com. To reuse the same TLS connection for the subresources, it is necessary for both hostnames to appear together in the TLS certificate's “Server Alternative Name (SAN)” list, but this step alone is not sufficient to convince browsers to coalesce. After all, the cdnjs.cloudflare.com service may or may not be reachable at the same server as blog.cloudflare.com, despite being on the same certificate. So how can the browser know? Coalescing only works if servers set up the right conditions, but clients have to decide whether to coalesce or not – thus, browsers require a signal to coalesce beyond the SANs list on the certificate. Revisiting our analogy, the assembly plant may order a part from a manufacturer directly, not knowing that the supplier already has the same part in its warehouse.

There are two explicit signals a browser can use to decide whether connections can be coalesced: one is IP-based, the other ORIGIN Frame-based. The former requires the server operators to tightly bind DNS records to the HTTP resources available on the server. This is difficult to manage and deploy, and actually creates a risky dependency, because you have to place all the resources behind a specific set or a single IP address. The way IP addresses influence coalescing decisions varies among browsers, with some choosing to be more conservative and others more permissive. Alternatively, the HTTP ORIGIN Frame is an easier signal for the servers to orchestrate; it’s also flexible and has graceful failure with no interruption to service (for a specification compliant implementation).

A foundational difference between both these coalescing signals is: IP-based coalescing signals are implicit, even accidental, and force clients to infer coalescing possibilities that may exist, or not. None of this is surprising since IP addresses are designed to have no real relationship with names! In contrast, ORIGIN Frame is an explicit signal from servers to clients that coalescing is available no matter what DNS says for any particular hostname.

We have experimented with IP-based coalescing previously; for the purpose of this blog we will take a deeper look at ORIGIN Frame-based coalescing.

What is the ORIGIN Frame standard?

The ORIGIN Frame is an extension to the HTTP/2 and HTTP/3 specification, a special Frame sent on stream 0 or the control stream of the connection respectively. The Frame allows the servers to send an ‘origin-set’ to the clients on an existing established TLS connection, which includes hostnames that it is authorized for and will not incur any HTTP 421 errors. Hostnames in the origin-set MUST also appear in the certificate SAN list for the server, even if those hostnames are announced on different IP addresses via DNS.

Specifically, two different steps are required:

Web servers must send a list enumerating the Origin Set (the hostnames that a given connection might be used for) in the ORIGIN Frame extension.
The TLS certificate returned by the web server must cover the additional hostnames being returned in the ORIGIN Frame in the DNS names SAN entries.

At a high-level ORIGIN Frames are a supplement to the TLS certificate that operators can attach to say, “Psst! Hey, client, here are the names in the SANs that are available on this connection -- you can coalesce!” Since the ORIGIN Frame is not part of the certificate itself, its contents can be made to change independently. No new certificate is required. There is also no dependency on IP addresses. For a coalesceable hostname, existing TCP/QUIC+TLS connections can be reused without requiring new connections or DNS queries.

Many websites today rely on content which is served by CDNs, like Cloudflare CDN service. The practice of using external CDN services offers websites the advantages of speed, reliability, and reduces the load of content served by their origin servers. When both the website, and the resources are served by the same CDN, despite being different hostnames, owned by different entities, it opens up some very interesting opportunities for CDN operators to allow connections to be reused and coalesced since they can control both the certificate management and connection requests for sending ORIGIN frames on behalf of the real origin server.

Unfortunately, there has been no way to turn the possibilities enabled by ORIGIN Frame into practice. To the best of our knowledge, until today, there has been no server implementation that supports ORIGIN Frames. Among browsers, only Firefox supports ORIGIN Frames. Since IP coalescing is challenging and ORIGIN Frame has no deployed support, is the engineering time and energy to better support coalescing worth the investment? We decided to find out with a large-scale Internet-wide measurement to understand the opportunities and predict the possibilities, and then implemented the ORIGIN Frame to experiment on production traffic.

Experiment #1: What is the scale of required changes?

In February 2021, we collected data for 500K of the most popular websites on the Internet, using a modified Web Page Test on 100 virtual machines. An automated Chrome (v88) browser instance was launched for every visit to a web page to eliminate caching effects (because we wanted to understand coalescing, not caching). On successful completion of each session, Chrome developer tools were used to retrieve and write the page load data as an HTTP Archive format (HAR) file with a full timeline of events, as well as additional information about certificates and their validation. Additionally, we parsed the certificate chains for the root web page and new TLS connections triggered by subresource requests to (i) identify certificate issuers for the hostnames, (ii) inspect the presence of the Subject Alternative Name (SAN) extension, and (iii) validate that DNS names resolve to the IP address used. Further details about our methodology and results can be found in the technical paper.

The first step was to understand what resources are requested by web pages to successfully render the page contents, and where these resources were present on the Internet. Connection coalescing becomes possible when subresource domains are ideally co-located. We approximated the location of a domain by finding its corresponding autonomous system (AS). For example, the domain attached to cdnjs is reachable via AS 13335 in the BGP routing table, and that AS number belongs to Cloudflare. The figure below describes the percentage of web pages and the number of unique ASes needed to fully load a web page.

Around 14% of the web pages need two ASes to fully load i.e. pages that have a dependency on one additional AS for subresources. More than 50% of the web pages need to contact no more than six ASes to obtain all the necessary subresources. This finding as shown in the plot above implies that a relatively small number of operators serve the sub-resource content necessary for a majority (~50%) of the websites, and any usage of ORIGIN Frames would need only a few changes to have its intended impact. The potential for connection coalescing can therefore be optimistically approximated to the number of unique ASes needed to retrieve all subresources in a web page. In practice however, this may be superseded by operational factors such as SLAs or helped by flexible mappings between sockets, names, and IP addresses which we worked on previously at Cloudflare.

We then tried to understand the impact of coalescing on connection metrics. The measured and ideal number of DNS queries and TLS connections needed to load a web page are summarized by their CDFs in the figure below.

Through modeling and extensive analysis, we identify that connection coalescing through ORIGIN Frames could reduce the number of DNS and TLS connections made by browsers by over 60% at the median. We performed this modeling by identifying the number of times the clients requested DNS records, and combined them with the ideal ORIGIN Frames to serve.

Many multi-origin servers such as those operated by CDNs tend to reuse certificates and serve the same certificate with multiple DNS SAN entries. This allows the operators to manage fewer certificates through their creation and renewal cycles. While theoretically one can have millions of names in the certificate, creating such certificates is unreasonable and a challenge to manage effectively. By continuing to rely on existing certificates, our modeling measurements bring to light the volume of changes required to enable perfect coalescing, while presenting information about the scale of changes needed, as highlighted in the figure below.

We identify that over 60% of the certificates served by websites do not need any modifications and could benefit from ORIGIN Frames, while with no more than 10 additions to the DNS SAN names in certificates we’re able to successfully coalesce connections to over 92% of the websites in our measurement. The most effective changes could be made by CDN providers by adding three or four of their most popular requested hostnames into each certificate.

Experiment #2: ORIGIN Frames in action

In order to validate our modeling expectations, we then took a more active approach in early 2022. Our next experiment focused on 5,000 websites that make extensive use of cdnjs.cloudflare.com as a subresource. By modifying our experimental TLS termination endpoint we deployed HTTP/2 ORIGIN Frame support as defined in the RFC standard. This involved changing the internal fork of net and http dependency modules of Golang which we have open sourced (see here, and here).

During the experiments, connecting to a website in the experiment set would return cdnjs.cloudflare.com in the ORIGIN frame, while the control set returned an arbitrary (unused) hostname. All existing edge certificates for the 5000 websites were also modified. For the experimental group, the corresponding certificates were renewed with cdnjs.cloudflare.com added to the SAN. To ensure integrity between control and experimental sets, control group domains certificates were also renewed with a valid and identical size third party domain used by none of the control domains. This is done to ensure that the relative size changes to the certificates is kept constant avoiding potential biases due to different certificate sizes. Our results were striking!

Sampling 1% of the requests we received from Firefox to the websites in the experiment, we identified over 50% reduction in new TLS connections per second indicating a lesser number of cryptographic verification operations done by both the client and reduced server compute overheads. As expected there were no differences in the control set indicating the effectiveness of connection re-use as seen by the CDN or server operators.

Discussion and insights

While our modeling measurements indicated that we could anticipate some performance improvements, in practice it was not significantly better suggesting that ‘no-worse’ is the appropriate mental model regarding performance. The subtle interplay between resource object sizes, competing connections, and congestion control is subject to network conditions. Bottleneck-share capacity, for example, diminishes as fewer connections compete for bottleneck resources on network links. It would be interesting to revisit these measurements as more operators deploy support on their servers for ORIGIN Frames.

Apart from performance, one major benefit of ORIGIN frames is in terms of privacy. How? Well, each coalesced connection hides client metadata that is otherwise leaked from non-coalesced connections. Certain resources on a web page are loaded depending on how one is interacting with the website. This means for every new connection for retrieving some resource from the server, TLS plaintext metadata like SNI (in the absence of Encrypted Client Hello) and at least one plaintext DNS query, if transmitted over UDP or TCP on port 53, is exposed to the network. Coalescing connections helps remove the need for browsers to open new TLS connections, and the need to do extra DNS queries. This prevents metadata leakage from anyone listening on the network. ORIGIN Frames help minimize those signals from the network path, improving privacy by reducing the amount of cleartext information leaked on path to network eavesdroppers.

While the browsers benefit from reduced cryptographic computations needed to verify multiple certificates, a major advantage comes from the fact that it opens up very interesting future opportunities for resource scheduling at the endpoints (the browsers, and the origin servers) such as prioritization, or recent proposals like HTTP early hints to provide clients experiences where connections are not overloaded or competing for those resources. When coupled with CERTIFICATE Frames IETF draft, we can further eliminate the need for manual certificate modifications as a server can prove its authority of hostnames after connection establishment without any additional SAN entries on the website’s TLS certificate.

Conclusion and call to action

In summary, the current Internet ecosystem has a lot of opportunities for connection coalescing with only a few changes to certificates and their server infrastructure. Servers can significantly reduce the number of TLS handshakes by roughly 50%, while reducing the number of render blocking DNS queries by over 60%. Clients additionally reap these benefits in privacy by reducing cleartext DNS exposure to network on-lookers.

To help make this a reality we are currently planning to add support for both HTTP/2 and HTTP/3 ORIGIN Frames for our customers. We also encourage other operators that manage third party resources to adopt support of ORIGIN Frame to improve the Internet ecosystem.Our paper submission was accepted to the ACM Internet Measurement Conference 2022 and is available for download. If you’d like to work on projects like this, where you get to see the rubber meet the road for new standards, visit our careers page!

Introducing the 2023 Intern-ets!

Emilie Ma — Wed, 23 Aug 2023 17:30:50 GMT

This year, Cloudflare welcomed a class of approximately 40 interns, hailing from five different countries for an unforgettable summer. As we joined both remotely and in-person across Cloudflare’s global offices, our experiences spanned a variety of roles from engineering, product management to internal auditing and marketing. Through invaluable mentorship, continuous learning, and the chance to make a real-world impact, our summer was truly enriched at every step. Join us, Anni and Emilie, as we provide an insider's perspective on a summer at Cloudflare, sharing snippets and quotes from our intern cohort.

printf(“Hello Intern-ets!”)

You might have noticed that we have a new name for the interns: the Intern-ets! Our fresh intern nickname was born from a brainstorm between us and our recruiter, Judy. While “Cloudies”, “Cloudterns”, and “Flaries” made the shortlist, a company-wide vote crowned "Intern-ets" as the favorite. And just like that, we've made Cloudflare history!

git commit -m “Innovation!”

We're all incredibly proud to have gotten the opportunity to tackle interesting and highly impactful projects throughout the duration of our internships. To give you a glimpse of our summer, here are a few that showcase the breadth and depth of our experiences.

Mia M., Product Manager intern, worked on the Cloudflare Secrets Store, which is a new product that will allow Cloudflare customers to store, encrypt, and deploy sensitive data across the product suite. She focused on creating requirements for the core platform and tackling the first milestone, bringing secrets and environment variables from the per-Worker level to the account level in the Workers platform.

Pierre, Research intern, focused on integrating differential privacy—a layer of protection with formal privacy guarantees—into distributed aggregation protocols. This privacy layer is imperative for user trust and data security as these protocols are commonly used for collecting sensitive information, such as browser telemetry or health data.

Johnny, Software Engineer intern, worked on a new feature for API Shield called Object-Level Access Policies as a first step in a solution to combat Broken Object-Level Authorization, which has been ranked as the #1 API Security flaw by OWASP. This feature will enable customers to specify explicit allowlists and blocklists for API traffic per object.

Olivia, Project Manager intern, led the Dissatisfied (DSAT) Customer Outreach Project and JIRA Automations for the Customer Support team. The DSAT project involves reaching out to Premium customers who express dissatisfaction with the goal of providing personalized contact to ensure they feel valued. The JIRA Automation project aims to create zero-touch tickets, removing Customer Support as the middle man.

Also don’t forget to check out the amazing intern projects that got featured in a blog post!

Emilie, Software Engineering intern introduced a debugging feature for Cloudflare Queues.

Joaquin, Software Engineer intern added a new Workers Database integration.

Austin, Software Engineer intern created scheduled deletion for Cloudflare Stream.

No null days here!

Beyond our projects, we had tons of fun getting to meet other Cloudflarians and experience the vibrant Cloudflare culture. Let's dive into some of the standout moments that made our internships truly special!

Remote, office, hybrid

This summer, the interns dotted the globe, working from cozy home setups to bustling offices in cities. Regardless of where we worked, we had a blast. Here's what some fellow interns have to say about their work experiences:

Austin office: Jada loved her colleagues at the Austin office as they were “warm and open to exploring the city, [...], and hanging out outside of work”. Anni and Maximo loved attending the Austin-based team summit where they attended strategy sessions and met the team in-person.

San Francisco office: Emmanuel F. enjoyed getting to interact with other engineers during SF Team Lunches. Matthew enjoyed working on the rooftop to a view of the city skyline. Jonathan appreciated the hybrid work model enjoyed by SF employees.

Remote work: Johnny liked the distributed and flexible work style that the company embraces. Daniël, also working remotely and found it amusing how “[s]everal people have noticed the Feynman Lectures on Physics on the shelf behind me in my home and have asked about it.”

Remote intern events: Emmanuel G., Aaron, and Feiyu enjoyed the intern calls that were held on GatherRound as “it was a fun, quick way to get to meet everyone.” Pradyumna particularly liked it when we played skribbl.io.

Mentorship

With so many exceptional minds at Cloudflare, every interaction became a chance for us to learn and grow. Here are some awe-inspiring individuals who have made our internships unforgettable:

Harshal, Systems Engineer: Aaron is grateful for his mentor Harshall. “I always left our conversations knowing more than I did coming into them”.

Revathy, Systems Engineer: Harshini is thankful to her mentor Revathy for “how she helps me to learn [...] the best way possible to do something and ultimately achieve my goals”.

Nevi, Product Manager: Anni admires her manager Nevi who is always thinking about the team and our customers and has invested in the personal growth and mentorship of interns.

Conner, Systems Engineer: Jonathan is grateful that he was always able to count on Conner for great engineering tips, guidance, and NeoVim wizardry.

Malgorzata, Data Scientist: Jada looks up to Malgorzata for being so welcoming, kind, and funny. She has a great attitude and besides being super knowledgeable, she is also willing to share her expertise and support others!

Executive chats

During our internships, we engaged in Executive fireside chats, diving deep with Cloudflare's top leaders. Each chat was insightful and surprising in a different way, and some of our favorite takeaways were…

John, CTO: Shaheen valued John’s humility in emphasizing daily learning from others at work, stating, “As I grow in my career, I intend to keep a similar attitude and try to learn from those around me by keeping myself grounded.”

Doug Kramer, General Counsel: Emilie valued Doug Kramer's advice on identifying a career "north star" to guide intentions while also recognizing "exit-ramps" or alternative paths that may offer unexpected fulfillment.

Matthew Prince, Co-founder and CEO: Yunfan loved hearing “the story about how Cloudflare developed from a start-up till today”.

Michelle Zatlyn, Co-founder and COO: Harsh learned from Michelle about “how they moved across the country, against everyone's advice, to start Cloudflare”, and Mia C. enjoyed learning that “Cloudflare started as a business school project”.

Snack bytes

A bonding point for the Intern-ets was our love for snacks! In July, we gathered the Intern-ets together for a virtual snack break. The University team sent out a box featuring snacks from Indonesia, a country none of us had visited (or tried goodies from… yet!). Below you can see us holding up our favorite snacks from the box!

Meanwhile, the on-site interns couldn't get enough of the office snacks. Favorites? Pita chips, Lucky Charms, chocolate almonds, coconut chocolate bars and coconut water. Plus, the Austin and San Francisco offices even have a Sour Patch Kids dispenser! Snack on!

Surprises

Every day at Cloudflare presented unexpected joys and challenges. Here's what the interns found most surprising:

High Impact: Simon, Emmanuel F. and Maximo were surprised to “[do] such visible and important work as an intern”. Austin agreed, noting “I was treated like any other member of the team [...] It felt like I was working on something important and not just a typical intern project!” Harshini added, “when [colleagues] hear what I am working on they go - ‘that is really cool, I can't wait to see that happen - we need it.’”

Support: Eames “was worried that it would feel like my achievements were the only thing that mattered. But my colleagues always showed concern for how I was feeling and how things were going, and I couldn't be more grateful for that.”

Industry vs Academia: Johnny mentioned “coming from academia, I was amazed by the amount of effort that has been, and is continuing to be, put into the products to really productionize what I have only before seen in research. It is another reminder of the scale in which we work!”

By the numbers

Here are some fun stats from our internship…

Johnny drove 30 hours from New York to Colorado
Maximo missed 0 days of going to the Austin office
Anni drank 86 matcha lattes this summer
Emilie participated in 38 Cloudfriends calls and coffee chats
Simon has waited around one week cumulatively for builds to finish

exit(0)

At Cloudflare, our internships aren’t just about work—they're about growth, mentorship, and real impact. We've built more than projects; we've forged lasting relationships. It’s been an unforgettable summer of challenges, bonding, and authentic experiences. For more about our journey this summer, check out our Cloudflare TV segment with Michelle Zatlyn, Co-founder and COO.

Finally, we would love to give a huge thanks to our university recruiters Judy, Trang, and Dani for creating such an amazing internship experience for us this summer!

Want to become an Intern-et or Cloudflarian?

Sign up here to be notified of new grad and internship opportunities for 2024. Cloudflare is also hiring for full-time opportunities: check out open positions and apply today!

Debug Queues from the dash: send, list, and ack messages

Emilie Ma — Fri, 11 Aug 2023 13:01:19 GMT

Today, August 11, 2023, we are excited to announce a new debugging workflow for Cloudflare Queues. Customers using Cloudflare Queues can now send, list, and acknowledge messages directly from the Cloudflare dashboard, enabling a more user-friendly way to interact with Queues. Though it can be difficult to debug asynchronous systems, it’s now easy to examine a queue’s state and test the full flow of information through a queue.

With guaranteed delivery, message batching, consumer concurrency, and more, Cloudflare Queues is a powerful tool to connect services reliably and efficiently. Queues integrate deeply with the existing Cloudflare Workers ecosystem, so developers can also leverage our many other products and services. Queues can be bound to producer Workers, which allow Workers to send messages to a queue, and to consumer Workers, which pull messages from the queue.

We’ve received feedback that while Queues are effective and performant, customers find it hard to debug them. After a message is sent to a queue from a producer worker, there’s no way to inspect the queue’s contents without a consumer worker. The limited transparency was frustrating, and the need to write a skeleton worker just to debug a queue was high-friction.

Now, with the addition of new features to send, list, and acknowledge messages in the Cloudflare dashboard, we’ve unlocked a much simpler debugging workflow. You can send messages from the Cloudflare dashboard to check if their consumer worker is processing messages as expected, and verify their producer worker’s output by previewing messages from the Cloudflare dashboard.

The pipeline of messages through a queue is now more open and easily examined. Users just getting started with Cloudflare Queues also no longer have to write code to send their first message: it’s as easy as clicking a button in the Cloudflare dashboard.

Sending messages

Both features are located in a new Messages tab on any queue’s page. Scroll to Send message to open the message editor.

From here, you can write a message and click Send message to send it to your queue. You can also choose to send JSON, which opens a JSON editor with syntax highlighting and formatting. If you’ve saved your message as a file locally, you can drag-and-drop the file over the textbox or click Upload a file to send it as well.

This feature makes testing changes in a queue’s consumer worker much easier. Instead of modifying an existing producer worker or creating a new one, you can send one-off messages. You can also easily verify if your queue consumer settings are behaving as expected: send a few messages from the Cloudflare dashboard to check that messages are batched as desired.

Behind the scenes, this feature leverages the same pipeline that Cloudflare Workers uses to send messages, so you can be confident that your message will be processed as if sent via a Worker.

Listing messages

On the same page, you can also inspect the messages you just sent from the Cloudflare dashboard. On any queue’s page, open the Messages tab and scroll to Queued messages.

If you have a consumer attached to your queue, you’ll fetch a batch of messages of the same size as configured in your queue consumer settings by default, to provide a realistic view of what would be sent to your consumer worker. You can change this value to preview messages one-at-a-time or even in much larger batches than would be normally sent to your consumer.

After fetching a batch of messages, you can preview the message’s body, even if you’ve sent raw bytes or a JavaScript object supported by the structured clone algorithm. You can also check the message’s timestamp; number of retries; producer source, such as a Worker or the Cloudflare dashboard; and type, such as text or JSON. This information can help you debug the queue’s current state and inspect where and when messages originated from.

The batch of messages that’s returned is the same batch that would be sent to your consumer Worker on its next run. Messages are even guaranteed to be in the same order on the UI as sent to your consumer. This feature grants you a looking glass view into your queue, matching the exact behavior of a consumer worker. This works especially well for debugging messages sent by producer workers and verifying queue consumer settings.

Listing messages from the Cloudflare dashboard also doesn’t interfere with an existing connected consumer. Messages that are previewed from the Cloudflare dashboard stay in the queue and do not have their number of retries affected.

This ‘peek’ functionality is unique to Cloudflare Queues: Amazon SQS bumps the number of retries when a message is viewed, and RabbitMQ retries the message, forcing it to the back of the queue. Cloudflare Queues’ approach means that previewing messages does not have any unintended side effects on your queue and your consumer. If you ever need to debug queues used in production, don’t worry - listing messages is entirely safe.

As well, you can now remove messages from your queue from the Cloudflare dashboard. If you’d like to remove a message or clear the full batch from the queue, you can select messages to acknowledge. This is useful for preventing buggy messages from being repeatedly retried without having to write a dummy consumer.

You might have noticed that this message preview feature operates similarly to another popular feature request for an HTTP API to pull batches of messages from a queue. Customers will be able to make a request to the API endpoint to receive a batch of messages, then acknowledge the batch to remove the messages from the queue. Under the hood, both listing messages from the Cloudflare dashboard and HTTP Pull/Ack use a common infrastructure, and HTTP Pull/Ack is coming very soon!

These debugging features have already been invaluable for testing example applications we’ve built on Cloudflare Queues. At an internal hack week event, we built a web crawler with Queues as an example use-case (check out the tutorial here!). During development, we took advantage of this user-friendly way to send messages to quickly iterate on a consumer worker before we built a producer worker. As well, when we encountered bugs in our consumer worker, the message previews were handy to realize we were sending malformed messages, and the message acknowledgement feature gave us an easy way to remove them from the queue.

New Queues debugging features — available today!

The Cloudflare dashboard features announced today provide more transparency into your application and enable more user-friendly debugging.

All Cloudflare Queues customers now have access to these new debugging tools. And if you’re not already using Queues, you can join the Queues Open Beta by enabling Cloudflare Queues here.Get started on Cloudflare Queues with our guide and create your next app with us today! Your first message is a single click away.

Introducing scheduled deletion for Cloudflare Stream

Austin Christiansen — Fri, 11 Aug 2023 13:00:45 GMT

Designed with developers in mind, Cloudflare Stream provides a seamless, integrated workflow that simplifies video streaming for creators and platforms alike. With features like Stream Live and creator management, customers have been looking for ways to streamline storage management.

Today, August 11, 2023, Cloudflare Stream is introducing scheduled deletion to easily manage video lifecycles from the Stream dashboard or our API, saving time and reducing storage-related costs. Whether you need to retain recordings from a live stream for only a limited time, or preserve direct creator videos for a set duration, scheduled deletion will simplify storage management and reduce costs.

Stream scheduled deletion

Scheduled deletion allows developers to automatically remove on-demand videos and live recordings from their library at a specified time. Live inputs can be set up with a deletion rule, ensuring that all recordings from the input will have a scheduled deletion date upon completion of the stream.

Let’s see how it works in those two configurations.

Getting started with scheduled deletion for on-demand videos

Whether you run a learning platform where students can upload videos for review, a platform that allows gamers to share clips of their gameplay, or anything in between, scheduled deletion can help manage storage and ensure you only keep the videos that you need. Scheduled deletion can be applied to both new and existing on-demand videos, as well as recordings from completed live streams. This feature lets you specify a specific date and time at which the video will be deleted. These dates can be applied in the Cloudflare dashboard or via the Cloudflare API.

Cloudflare dashboard

From the Cloudflare dashboard, select Videos under Stream
Select a video
Select Automatically Delete Video
Specify a desired date and time to delete the video
Click Submit to save the changes

Cloudflare API

The Stream API can also be used to set the scheduled deletion property on new or existing videos. In this example, we’ll create a direct creator upload that will be deleted on December 31, 2023:

curl -X POST \
-H 'Authorization: Bearer ' \
-d '{ "maxDurationSeconds": 10, "scheduledDeletion": "2023-12-31T12:34:56Z" }' \
https://api.cloudflare.com/client/v4/accounts//stream/direct_upload

For more information on live inputs and how to configure deletion policies in our API, refer to the documentation.

Getting started with automated deletion for Live Input recordings

We love how recordings from live streams allow those who may have missed the stream to catch up, but these recordings aren’t always needed forever. Scheduled recording deletion is a policy that can be configured for new or existing live inputs. Once configured, the recordings of all future streams on that input will have a scheduled deletion date calculated when the recording is available. Setting this retention policy can be done from the Cloudflare dashboard or via API operations to create or edit Live Inputs:

Cloudflare Dashboard

From the Cloudflare dashboard, select Live Inputs under Stream
Select Create Live Input or an existing live input
Select Automatically Delete Recordings
Specify a number of days after which new recordings should be deleted
Click Submit to save the rule or create the new live input

Cloudflare API

The Stream API makes it easy to add a deletion policy to new or existing inputs. Here is an example API request to create a live input with recordings that will expire after 30 days:

curl -X POST \
-H 'Authorization: Bearer ' \
-H 'Content-Type: application/json' \
-d '{ "recording": {"mode": "automatic"}, "deleteRecordingAfterDays": 30 }' \
https://api.staging.cloudflare.com/client/v4/accounts//stream/live_inputs/

For more information on live inputs and how to configure deletion policies in our API, refer to the documentation.

Try out scheduled deletion today

Scheduled deletion is now available to all Cloudflare Stream customers. Try it out now and join our Discord community to let us know what you think! To learn more, check out our developer docs. Stay tuned for more exciting Cloudflare Stream updates in the future.

Cloudflare Workers database integration with Upstash

Joaquin Gimenez — Wed, 02 Aug 2023 13:00:22 GMT

This blog post references a feature which has updated documentation. For the latest reference content, visit https://developers.cloudflare.com/workers/databases/third-party-integrations/

During Developer Week we announced Database Integrations on Workers a new and seamless way to connect with some of the most popular databases. You select the provider, authorize through an OAuth2 flow and automatically get the right configuration stored as encrypted environment variables to your Worker.

Today we are thrilled to announce that we have been working with Upstash to expand our integrations catalog. We are now offering three new integrations: Upstash Redis, Upstash Kafka and Upstash QStash. These integrations allow our customers to unlock new capabilities on Workers. Providing them with a broader range of options to meet their specific requirements.

Add the integration

We are going to show the setup process using the Upstash Redis integration.

Select your Worker, go to the Settings tab, select the Integrations tab to see all the available integrations.

After selecting the Upstash Redis integration we will get the following page.

First, you need to review and grant permissions, so the Integration can add secrets to your Worker. Second, we need to connect to Upstash using the OAuth2 flow. Third, select the Redis database we want to use. Then, the Integration will fetch the right information to generate the credentials. Finally, click “Add Integration” and it's done! We can now use the credentials as environment variables on our Worker.

Implementation example

On this occasion we are going to use the CF-IPCountry header to conditionally return a custom greeting message to visitors from Paraguay, United States, Great Britain and Netherlands. While returning a generic message to visitors from other countries.

To begin we are going to load the custom greeting messages using Upstash’s online CLI tool.

➜ set PY "Mba'ẽichapa 🇵🇾"
OK
➜ set US "How are you? 🇺🇸"
OK
➜ set GB "How do you do? 🇬🇧"
OK
➜ set NL "Hoe gaat het met u? 🇳🇱"
OK

We also need to install @upstash/redis package on our Worker before we upload the following code.

import { Redis } from '@upstash/redis/cloudflare'
 
export default {
  async fetch(request, env, ctx) {
    const country = request.headers.get("cf-ipcountry");
    const redis = Redis.fromEnv(env);
    if (country) {
      const localizedMessage = await redis.get(country);
      if (localizedMessage) {
        return new Response(localizedMessage);
      }
    }
    return new Response("👋👋 Hello there! 👋👋");
  },
};

Just like that we are returning a localized message from the Redis instance depending on the country which the request originated from. Furthermore, we have a couple ways to improve performance, for write heavy use cases we can use Smart Placement with no replicas, so the Worker code will be executed near the Redis instance provided by Upstash. Otherwise, creating a Global Database on Upstash to have multiple read replicas across regions will help.

Try it now

Upstash Redis, Kafka and QStash are now available for all users! Stay tuned for more updates as we continue to expand our Database Integrations catalog.

Ethereum Gateway support for Görli + Sepolia Testnets and the Ethereum Merge

Ainesh Arumugam — Tue, 13 Sep 2022 13:00:00 GMT

Today we are excited to announce support for the Ethereum Merge on the Ethereum network and that our Ethereum gateways now support the Görli and Sepolia test networks (testnets). Sepolia and Görli testnets can be used to test and develop full decentralized applications (dapps) or test upgrades to be deployed on the mainnet Ethereum network. These testnets also use the Ethereum protocol, with the major difference that the Ether transacted on the testnet has no value.

Ethereum is a decentralized blockchain with smart contract functionality which Cloudflare allows you to interact with through an HTTP API. For a quick primer on Ethereum and our gateway, please refer to our previous blog post on the Ethereum Gateway.

As preparation for the merge, the Ethereum Foundation has executed merges on multiple testnets to ensure that the actual mainnet merge will occur with minimal to no disruption. These testnets both successfully upgraded to Proof of Stake and Proof of Authority, respectively. Cloudflare’s Testnet Gateway handled the Görli-Prater merge without issue, ensuring that we will be ready and prepared for the upcoming Ethereum Merge for mainnet. Our testnet gateways are live and ready for use by Cloudflare Ethereum Gateway customers.

In this blog, we are going to discuss the consensus transition entailed by the Ethereum Merge, changes in the Cloudflare Ethereum Gateway, and how you can start using them today.

Consensus Mechanisms

Proof of Work is the original consensus mechanism of the Ethereum blockchain, popularized by Bitcoin. Miners compete to be the first to solve the puzzle, allowing them to update the blockchain with the latest transactions and receive a reward in exchange. The more miners and more processing power focused on solving these puzzles, the more secure the network. While this was first thought to help decentralization as it could be run on commodity hardware, users started to run highly powerful computer hardware, like ASICs and GPUs, to solve these complex math puzzles. This means the security of the network comes with a massive tradeoff. This massive network of Ethereum miners consume more than 80 terawatt Hours per year – more than the country of Chile. Clearly, this is not a sustainable mechanism for consensus, especially as use of cryptocurrency and web3 technologies becomes more widespread.

Proof of Stake is a consensus mechanism that lets users that have staked a specified amount of cryptocurrency run nodes to propose and validate blocks and receive a cryptocurrency reward. These nodes, commonly referred to as validators, are responsible for keeping the network secured and progressing. For every slot, one random validator node is chosen to be the proposer and a committee of random validator nodes is chosen to validate the proposed block. In the case of validators acting dishonestly or being unavailable, the validator will be penalized economically by having their stake “slashed”. Proof of Stake is significantly more sustainable – the Ethereum Foundation estimates that it will consume 99.95% less electricity than Proof of Work. Plus, it comes with the additional benefit that validators have a financial incentive to uphold the health of the blockchain.

Proof of Authority is very similar to Proof of Stake, as validators propose and validate blocks to progress their blockchain. However, a significant difference is that nodes can only become validators if they’re approved by an authority node, instead of putting up a stake. Cloudflare currently runs one such authority node for the now-deprecated Rinkeby testnet. This is a fairly uncommon consensus algorithm for public blockchains in comparison to Proof of Work and Proof of Stake, but is commonly used in trusted communities like internal networks for corporations and governments.

Cloudflare Ethereum Gateway

At Cloudflare, we believe in using our own technologies to build our products, and the Ethereum Gateway is no exception. The Ethereum Gateway allows any customer to interact with the Ethereum network without needing to run their own dedicated node. A JSON-RPC call is first received by a Worker, serverless code deployed in all of our data centers, ensuring that queries from any geographic region are processed quickly, and that requests are normalized using the latest block number known to the Worker.

The Worker then passes the call to a Cloudflare Load Balancer, corresponding to the specified Ethereum mainnet or testnet, which sends the call to Ethereum Node proxies inside our Kubernetes cluster. The Ethereum Node proxies queue calls and distributes them to ready and synced Ethereum Nodes that have the requested block. Our Ethereum nodes consist of an execution client and a consensus client.

The consensus client is responsible for Proof of Stake consensus, and we’ve added it to prepare the gateway for the merge. Ethereum Nodes then communicate with the specified Ethereum network to fulfill the RPC request. To ensure maximal speed, reliability, and availability, we have built in redundant Ethereum Node proxy instances and Ethereum Nodes in our cluster.

The Merge

The Ethereum Merge is a long-awaited upgrade to the Ethereum network, changing the consensus method from the current and wasteful Proof of Work protocol to a more efficient Proof of Stake protocol. The merge also opens up the door for further developments to the Ethereum network, such as sharding, which promises to speed up transactions and lower costs.

The Ethereum Merge combines the current Ethereum Blockchain with the Ethereum Beaconchain, a Proof of Stake chain, when the Terminal Total Difficulty (TTD) is hit. Once the merge is completed, the Ethereum Beaconchain will be where consensus clients will communicate to propose and validate blocks. The existing blockchain will merge with the beaconchain, linking every block after the merge to a slot on the beaconchain. The blockchain will continue to handle Ethereum transactions and smart contracts.

To prepare for the merge, a node operator must deploy a consensus client like Prysm or Lighthouse alongside their execution client. If this doesn’t occur prior to the merge, their node’s copy of the blockchain will stop syncing and the execution client will be stuck on the last block prior to the merge.

Sepolia and Görli Testnets

As per our Ethereum gateway documentation, we have made it extremely easy to send JSON-RPC calls to your preferred testnet. After you have created your Ethereum gateway, change the network in the URL from mainnet to sepolia or goerli. For example, calling eth_blockNumber to the sepolia testnet for this example gateway will look like this:

$ curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc": "2.0", "method": "net_version", "params": [], "id": 35}'  https://web3-trial.cloudflare-eth.com/v1/sepolia
{"jsonrpc":"2.0","result":"11155111","id":35}

This testnet support will help you ensure that your changes can be easily tested and hardened before deploying to the Ethereum Mainnet without incurring additional risk to your brand trust or product availability, all while not having to worry about operating your own infrastructure.

We want to ensure that anyone leveraging our Ethereum Gateway is able to achieve confidence and trust that whatever changes are pushed forward do not impact the end user experience. At the end of the day, the Internet is for end users and their experience and perception must always be kept within our purview at all times.

Rinkeby

As part of this announcement, we will be deprecating our Rinkeby signer with public address 0xf10326c1c6884b094e03d616cc8c7b920e3f73e0, which we operated to support the Ethereum ecosystem. We will stop Rinkeby testnet support on January 15, 2023, following the Ethereum Foundation’s move to deprecate the Rinkeby testnet.

Also, if you can’t wait to start building on our web3 gateways, check out our product documentation for more guidance.

Introducing thresholds in Security Event Alerting: a z-score love story

Kristina Galicova — Tue, 30 Aug 2022 14:00:00 GMT

Today we are excited to announce thresholds for our Security Event Alerts: a new and improved way of detecting anomalous spikes of security events on your Internet properties. Previously, our calculations were based on z-score methodology alone, which was able to determine most of the significant spikes. By introducing a threshold, we are able to make alerts more accurate and only notify you when it truly matters. One can think of it as a romance between the two strategies. This is the story of how they met.

Author’s note: as an intern at Cloudflare I got to work on this project from start to finish from investigation all the way to the final product.

Once upon a time

In the beginning, there were Security Event Alerts. Security Event Alerts are notifications that are sent whenever we detect a threat to your Internet property. As the name suggests, they track the number of security events, which are requests to your application that match security rules. For example, you can configure a security rule that blocks access from certain countries. Every time a user from that country tries to access your Internet property, it will log as a security event. While a security event may be harmless and fired as a result of the natural flow of traffic, it is important to alert on instances when a rule is fired more times than usual. Anomalous spikes of too many security events in a short period of time can indicate an attack. To find these anomalies and distinguish between the natural number of security events and that which poses a threat, we need a good strategy.

The lonely life of a z-score

Before a threshold entered the picture, our strategy worked only on the basis of a z-score. Z-score is a methodology that looks at the number of standard deviations a certain data point is from the mean. In our current configuration, if a spike crosses the z-score value of 3.5, we send you an alert. This value was decided on after careful analysis of our customers’ data, finding it the most effective in determining a legitimate alert. Any lower and notifications will get noisy for smaller spikes. Any higher and we may miss out on significant events. You can read more about our z-score methodology in this blog post.

The following graphs are an example of how the z-score method works. The first graph shows the number of security events over time, with a recent spike.

To determine whether this spike is significant, we calculate the z-score and check if the value is above 3.5:

As the graph shows, the deviation is above 3.5 and so an alert is triggered.

However, relying on z-score becomes tricky for domains that experience no security events for a long period of time. With many security events at zero, the mean and standard deviation depress to zero as well. When a non-zero value finally appears, it will always be infinite standard deviations away from the mean. As a result, it will always trigger an alert even on spikes that do not pose any threat to your domain, such as the below:

With five security events, you are likely going to ignore this spike, as it is too low to indicate a meaningful threat. However, the z-score in this instance will be infinite:

Since a z-score of infinity is greater than 3.5, an alert will be triggered. This means that customers with few security events would often be overwhelmed by event alerts that are not worth worrying about.

Letting go of zeros

To avoid the mean and standard deviation becoming zero and thus alerting on every non-zero spike, zero values can be ignored in the calculation. In other words, to calculate the mean and standard deviation, only data points that are higher than zero will be considered.

With those conditions, the same spike to five security events will now generate a different z-score:

Great! With the z-score at zero, it will no longer trigger an alert on the harmless spike!

But what about spikes that could be harmful? When calculations ignore zeros, we need enough non-zero data points to accurately determine the mean and standard deviation. If only one non-zero value is present, that data point determines the mean and standard deviation. As such, the mean will always be equal to the spike, z-score will always be zero and an alert will never be triggered:

For a spike of 1000 events, we can tell that there is something wrong and we should trigger an alert. However, because there is only one non-zero data point, the z-score will remain zero:

The z-score does not cross the value 3.5 and an alert will not be triggered.

So what’s better? Including zeros in our calculations can skew the results for domains with too many zero events and alert them every time a spike appears. Not including zeros is mathematically wrong and will never alert on these spikes.

Threshold, the prince charming

Clearly, a z-score is not enough on its own.

Instead, we paired up the z-score with a threshold. The threshold represents the raw number of security events an Internet property can have, below which an alert will not be sent. While z-score checks whether the spike is at least 3.5 standard deviations above the mean, the threshold makes sure it is above a certain static value. If both of these conditions are met, we will send you an alert:

The above spike crosses the threshold of 200 security events. We now have to check that the z-score is above 3.5:

The z-score value crosses 3.5 and an alert will be sent.

A threshold for the number of security events comes as the perfect complement. By itself, the threshold cannot determine whether something is a spike, and would simply alert on any value crossing it. This blog post describes in more detail why thresholds alone do not work. However, when paired with z-score, they are able to share their strengths and cover for each other's weaknesses. If the z-score falsely detects an insignificant spike, the threshold will stop the alert from triggering. Conversely, if a value does cross the security events threshold, the z-score ensures there is a reasonable variance from the data average before allowing an alert to be sent.

The invaluable value

To foster a successful relationship between the z-score and security events threshold, we needed to determine the most effective threshold value. After careful analysis of our previous attacks on customers, we set the value to 200. This number is high enough to filter out the smaller, noisier spikes, but low enough to expose any threats.

Am I invited to the wedding?

Yes, you are! The z-score and threshold relationship is already enabled for all WAF customers, so all you need to do is sit back and relax. For enterprise customers, the threshold will be applied to each type of alert enabled on your domain.

Happily ever after

The story certainly does not end here. We are constantly iterating on our alerts, so keep an eye out for future updates on the road to make our algorithms even more personalized for your Internet properties!

Performance isolation in a multi-tenant database environment

Justin Kwan — Fri, 26 Aug 2022 15:08:06 GMT

Operating at Cloudflare scale means that across the technology stack we spend a great deal of time handling different load conditions. In this blog post we talk about how we solved performance difficulties with our Postgres clusters. These clusters support a large number of tenants and highly variable load conditions leading to the need to isolate activity to prevent tenants taking too much time from others. Welcome to real-world, large database cluster management!

As an intern at Cloudflare I got to work on improving how our database clusters behave under load and open source the resulting code.

Cloudflare operates production Postgres clusters across multiple regions in data centers. Some of our earliest service offerings, such as our DNS Resolver, Firewall, and DDoS Protection, depend on our Postgres clusters' high availability for OLTP workloads. The high availability cluster manager, Stolon, is employed across all clusters to independently control and replicate data across Postgres instances and elect Postgres leaders and failover under high load scenarios.

PgBouncer and HAProxy act as the gateway layer in each cluster. Each tenant acquires client-side connections from PgBouncer instead of Postgres directly. PgBouncer holds a pool of maximum server-side connections to Postgres, allocating those across multiple tenants to prevent Postgres connection starvation. From here, PgBouncer forwards queries to HAProxy, which load balances across Postgres primary and read replicas.

Problem

Our multi-tenant Postgres instances operate on bare metal servers in non-containerized environments. Each backend application service is considered a single tenant, where they may use one of multiple Postgres roles. Due to each cluster serving multiple tenants, all tenants share and contend for available system resources such as CPU time, memory, disk IO on each cluster machine, as well as finite database resources such as server-side Postgres connections and table locks. Each tenant has a unique workload that varies in system level resource consumption, making it impossible to enforce throttling using a global value.

This has become problematic in production affecting neighboring tenants:

Throughput. A tenant may issue a burst of transactions, starving shared resources from other tenants and degrading their performance.
Latency: A single tenant may issue very long or expensive queries, often concurrently, such as large table scans for ETL extraction or queries with lengthy table locks.

Both of these scenarios can result in degraded query execution for neighboring tenants. Their transactions may hang or take significantly longer to execute (higher latency) due to either reduced CPU share time, or slower disk IO operations due to many seeks from misbehaving tenant(s). Moreover, other tenants may be blocked from acquiring database connections from the database proxy level (PgBouncer) due to existing ones being held during long and expensive queries.

Previous solution

When database cluster load significantly increases, finding which tenants are responsible is the first challenge. Some techniques include searching through all tenants' previous queries under typical system load and determining whether any new expensive queries have been introduced under the Postgres' pg_stat_activity view.

Database concurrency throttling

Once the misbehaving tenants are identified, Postgres server-side connection limits are manually enforced using the Postgres query.

ALTER USER "some_bad-user" WITH CONNECTION LIMIT 123;

This essentially restricts or “squeezes” the concurrent throughput for a single user, where each tenant will only be able to exhaust their share of connections.

Manual concurrency (connection) throttling has shown improvements in shedding load in Postgres during high production workloads:

While we have seen success with this approach, it is not perfect and is horribly manual. It also suffers from the following:

Postgres does not immediately kill existing tenant connections when a new user limit is set; the user may continue to issue bursty or expensive queries.
Tenants may still issue very expensive, resource intensive queries (affecting neighboring tenants) even if their concurrency (connection pool size) is reduced.
Manually applying connection limits against a misbehaving tenant is toil; an SRE could be paged to physically apply the new limit at any time of the day.
Manually analyzing and detecting misbehaving tenants based on queries can be time-consuming and stressful especially during an incident, requiring production SQL analysis experience.
Additionally, applying new throttling limits per user/pool, such as the allocated connection count, can be arbitrary and experimental while requiring extensive understanding of tenant workloads.
Oftentimes, Postgres may be under so much load that it begins to hang (CPU starvation). SREs may be unable to manually throttle tenants through native interfaces once a high load situation occurs.

New solution

Gateway concurrency throttling

Typically, the system level resource consumption of a query is difficult to control and isolate once submitted to the server or database system for execution. However, a common approach is to intercept and throttle connections or queries at the gateway layer, controlling per user/pool traffic characteristics based on system resource consumption.

We have implemented connection throttling at our database proxy server/connection pooler, PgBouncer. Previously, PgBouncer’s user level connection limits would not kill existing connections, but only prevent exceeding it. We now support the ability to throttle and kill existing connections owned by each user or each user’s connection pool statically via configuration or at runtime via new administrative commands.

PgBouncer Configuration

[users]
dns_service_user = max_user_connections=60
firewall_service_user = max_user_connections=80
[pools]
user1.database1 = pool_size=90

PgBouncer Runtime Commands

SET USER dns_service_user = ‘max_user_connections=40’;
SET POOL dns_service_user.dns_db = ‘pool_size=30’;

This required major bug fixes, refactoring and implementation work in our fork of PgBouncer. We’ve also raised multiple pull requests to contribute all of our features to PgBouncer open source. To read about all of our work in PgBouncer, read this blog.

These new features now allow for faster and more granular "load shedding" against a misbehaving tenant’s concurrency (connection pool, user and database pair), while enabling stricter performance isolation.

Future solutions

We are continuing to build infrastructure components that monitor per-tenant resource consumption and detect which tenants are misbehaving based on system resource indicators against historical baselines. We aim to automate connection and query throttling against tenants using these new administrative commands.

We are also experimenting with various automated approaches to enforce strict tenant performance isolation.

Congestion avoidance

An adaptation of the TCP Vegas congestion avoidance algorithm can be employed to adaptively estimate and enforce each tenant’s optimal concurrency while still maintaining low latency and high throughput for neighboring tenants. This approach does not require resource consumption profiling, manual threshold tuning, knowledge of underlying system hardware, or expensive computation.

Traditionally, TCP Vegas converges to the initially unknown and optimal congestion window (max packets that can be sent concurrently). In the same spirit, we can treat the unknown congestion window as the optimal concurrency or connection pool size for database queries. At the gateway layer, PgBouncer, each tenant will begin with a small connection pool size, while we dynamically sample each tenant’s transaction’s round trip time (RTT) against Postgres. We gradually increase the connection pool size (congestion window) of a tenant so long as their transaction RTTs do not deteriorate.

When a tenant's sampled transaction latency increases, the formula's minimum by sampled request latency ratio will decrease, naturally reducing the tenant's available concurrency which reduces database load.

Essentially, this algorithm will "back off" when observing high query latencies as the indicator of high database load, regardless of whether the latency is due to CPU time or disk/network IO blocking, etc. This formula will converge to find the optimal concurrency limit (connection pool size) since the latency ratio always converges to 0 with sufficiently large sample request latencies. The square root of the current tenant pool size is chosen as a constant request "burst" headroom because of its fast growth and being relatively large for small pool sizes (when latencies are low) but converges when the pool size is reduced (when latencies are high).

Rather than reactively shedding load, congestion avoidance preventatively or “smoothly” throttles traffic before load induced performance degradation becomes an issue. This algorithm aims to prevent database server resource starvation which causes other queries to hang.

Theoretically, if one tenant misbehaves and causes load induced latency for others, this TCP congestion algorithm may incorrectly blindly throttle all tenants. Hence why it may be necessary to apply this adaptive throttling only against tenants with high CPU to latency correlation when the system performance is degrading.

Tenant resource quotas

Configurable resource quotas can be introduced per each tenant. Upstream application service tenants are restricted to their allocated share of resources expressed as CPU % utilized per second and max memory. If a tenant overuses their share, the database gateway (PgBouncer) should throttle their concurrency, queries per second and ingress bytes to force consumption within their allocated slice.

Resource throttling a tenant must not "spillover" or affect other tenants accessing the same cluster. This could otherwise reduce the availability of other customer-facing applications and violate SLO (service-level objectives). Resource restriction must be isolated to each tenant.

If traffic is low against Postgres instances, tenants should be permitted to exceed their allocation limit. However, when load against the cluster degrades the entire performance of the system (latency), the tenant's limit must be re-enforced at the gateway layer, PgBouncer. We can make deductions around the health of the entire database server based on indicators such as average query latency’s rate of change against a predefined threshold. All tenants should agree that a surplus in resource consumption may result in query throttling of any pattern.

Each tenant has a unique and variable workload, which may degrade multi tenant performance at any time. Quick detection requires profiling the baseline resource consumption of each tenant’s (or tenant’s connection pooled) workload against each local Postgres server (backend pids) in near real-time. From here, we can correlate the “baseline” traffic characteristics with system level resource consumption per database instance.

Taking an average or generalizing statistical measures across distributed nodes (each tenant's resource consumption on Postgres instances in this case) can be inaccurate due to high variance in traffic against leader vs replica instances. This would lead to faulty throttling decisions applied against users. For instance, we should not throttle a user’s concurrency on an idle read replica even if the user consumes excessive resources on the primary database instance. It is preferable to capture tenant consumption on a per Postgres instance level, and enforce throttling per instance rather than across the entire cluster.

Multivariable regression can be employed to model the relationship between independent variables (concurrency, queries per second, ingested bytes) against the dependent variables (system level resource consumption). We can calculate and enforce the optimal independent variables per tenant under high load scenarios. To account for workload changes, regression adaptability vs accuracy will need to be tuned by adjusting the sliding window size (amount of time to retain profiled data points) when capturing workload consumption.

Gateway query queuing

User queries can be prioritized for submission to Postgres at the gateway layer (PgBouncer). Within a one or multiple global priority queues, query submissions by all tenants are ordered based on the current resource consumption of the tenant’s connection pool or the tenant itself. Alternatively, ordering can be based on each query’s historical resource consumption, where each query is independently profiled. Based on changes in tenant resource consumption captured from each Postgres instance’s server, all queued queries can be reordered every time the scheduler forwards a query to be submitted.

To prevent priority queue starvation (one tenant’s query is at the end of the queue and is never executed), the gateway level query queuing can be configured to only enable when there is peak load/traffic against the Postgres instance. Or, the time of enqueueing a query can be factored into the priority ordering.

This approach would isolate tenant performance by allowing non-offending tenants to continue reserving connections and executing queries (such as critical health monitoring queries). Higher latency would only be observed from the tenants that are utilizing more resources (from many/expensive transactions). This approach is straightforward to understand, generic in application (can queue transactions based on other input metrics), and non-destructive as it does not kill client/server connections, and should only drop queries when the in-memory priority queue reaches capacity.

Conclusion

Performance isolation in our multi-tenant storage environment continues to be a very interesting challenge that touches areas including OS resource management, database internals, queueing theory, congestion algorithms and even statistics. We’d love to hear how the community has tackled the “noisy neighbor” problem by isolating tenant performance at scale!

Open sourcing our fork of PgBouncer

Justin Kwan — Fri, 26 Aug 2022 14:30:37 GMT

Cloudflare operates highly available Postgres production clusters across multiple data centers, supporting the transactional workloads of our core service offerings such as our DNS Resolver, Firewall, and DDoS Protection.

Multiple PgBouncer instances sit at the front of the gateway layer per each cluster, acting as a TCP proxy that provides Postgres connection pooling. PgBouncer’s pooling enables upstream applications to connect to Postgres, without having to constantly open and close connections (expensive) at the database level, while also reducing the number of Postgres connections used. Each tenant acquires client-side connections from PgBouncer instead of Postgres directly.

PgBouncer will hold a pool of maximum server-side connections to Postgres, allocating those across multiple tenants to prevent Postgres connection starvation. From here, PgBouncer will forward backend queries to HAProxy, which load balances across Postgres primary and read replicas.

As an intern at Cloudflare I got to work on improving how our database clusters behave under load and open source the resulting code.

We run our Postgres infrastructure in non-containerized, bare metal environments which consequently leads to multitenant resource contention between Postgres users. To enforce stricter tenant performance isolation at the database level (CPU time utilized, memory consumption, disk IO operations), we’d like to configure and enforce connection limits per user and connection pool at PgBouncer.

To do that we had to add features and fix bugs in PgBouncer. Rather than continue to maintain a private fork we are open sourcing our code for others to use.

Authentication Rejection

The PgBouncer connection pooler offers options to enforce server connection pool size limits (effective concurrency) per user via static configuration. However, an authentication bug upstream prevented these features from correctly working when Postgres was set to use HBA authentication. Administrators who sensibly use server-side authentication could not take advantage of these user-level features.

This ongoing issue has also been experienced by others in the open-source community:

https://github.com/pgbouncer/pgbouncer/issues/484 https://github.com/pgbouncer/pgbouncer/issues/596

Root Cause

PgBouncer needs a Postgres user’s password when proxying submitted queries from client connection to a Postgres server connection. PgBouncer will fetch a user’s Postgres password defined in userlist.txt (auth_file) when a user first logs in to compare against the provided password. However, if the user is not defined in userlist.txt, Pgbouncer will fetch their password from the Postgres pg_shadow system view for comparison. This password will be used when PgBouncer subsequently forwards queries from this user to Postgres. The same applies when Postgres is configured to use HBA authentication.

Following serious debugging efforts and time spent in GDB, we found that multiple user objects are typically created for a single real user: via configuration loading from the [users] section and upon the user’s first login. In PgBouncer, any users requiring a shadow auth query would be stored under their respective database struct instance, whereas any user with a password defined in userlist.txt would be stored globally. Because the non-authenticated user already existed in memory after being parsed from the [users] section, PgBouncer assumed that the user was defined in userlist.txt, where the shadow authentication query could be skipped. It would not bother to fetch and set the user’s password upon first login, resulting in an empty user password. This is why subsequent queries submitted by the user would be rejected with authentication failure at Postgres.

To solve this, we simplified the code to globally store all users in one place rather than store different types of users (requiring different methods of authentication) in a disaggregated fashion per database or globally. Also, rather than assuming a user is authenticated if they merely exist, we keep track of whether the user requires authentication via auth query or from fetching their password from userlist.txt. This depends on how they were created.

We saw the value in troubleshooting and fixing these issues; it would unlock an entire class of features in PgBouncer for our use cases, while benefiting many in the open-source community.

New Features

We’ve also done work to implement and support additional features in PgBouncer to enforce stricter tenant performance isolation.

Previously, PgBouncer would only prevent tenants from exceeding preconfigured limits, not particularly helpful when it’s too late and a user is misbehaving or already has too many connections. PgBouncer now supports enforcing or shrinking per user connection pool limits at runtime, where it is most critically needed to throttle tenants who are issuing a burst of expensive queries, or are hogging connections from other tenants. We’ve also implemented new administrative commands to throttle the maximum connections per user or per pool at runtime.

PgBouncer also now supports statically configuring and dynamically enforcing connection limits per connection pool. This feature is extremely important in order to granularly throttle a tenant’s misbehaving connection pool without throttling and reducing availability on its other non-misbehaving pools.

PgBouncer Configuration

[users]
dns_service_user = max_user_connections=60
firewall_service_user = max_user_connections=80
[pools]
user1.database1 = pool_size=90

PgBouncer Runtime Commands

SET USER dns_service_user = ‘max_user_connections=40’;
SET POOL dns_service_user.dns_db = ‘pool_size=30’;

These new features required major refactoring around how PgBouncer stores users, databases weakly referenced and stored passwords of different users, and how we enforce killing server side connections while still in use.

Conclusion

We are committed to improving PgBouncer in open source and contributing all of our features to benefit the wider community. If you are interested, please consider contributing to our open source PgBouncer fork. After all, it is the community that makes PgBouncer possible!

Cloudflare Support Portal gets an overhaul

Meghan Bevill — Tue, 16 Aug 2022 13:00:00 GMT

The Cloudflare Support team is excited to announce the launch of our brand-new Customer Support Portal. When our customers open support tickets, we understand that they want quick and accurate responses from us. For those of you who have opened a support ticket in the past, we are certain you will notice the improvements we've made! The new Support Portal lives where our ticket submission form has always been, dash.cloudflare.com/support, but that's where the similarities between the old and the new one end.

What can you expect in the new portal?

The new Support Portal will help you solve your problems quickly and effectively, by getting you on the fastest path to resolution. In some cases, the most efficient way to resolve your issue will be to use our self-help resources or our machine learning-trained Support Bot. Other times, the most efficient way to resolve your issue will be by working with one of our Support Engineers via ticket, phone or chat, depending on your plan type. Regardless of how we help you solve your issue, we will have more context about the products you are using and your issue up front, reducing time-consuming back and forth.

The new portal has several features that will make it easier for you to access the support you need, including:

Fast and secure ticket submission for verified Cloudflare users
An easier-to-use interface that serves relevant resources based on your issue summary
Machine learning-powered Support Bot to run diagnostics and serve targeted help guides

Everyone is encouraged to begin using our new portal. Tickets submitted through our legacy form are typically solved faster than tickets emailed to us, and we expect the updates in our new form to help us resolve your issues even faster!

If you are ready to be one of the first people to take advantage of our new Support Portal, you can now opt in and begin using the new experience to access resources and submit tickets. Just hit the Support dropdown in your dashboard and click Contact Support.

Below is a preview of what you can expect with the new experience.

Relevant self-help resources at your fingertips

The biggest change you’ll notice from our old ticket submission form is that we've made it easier to get help. First, we link you directly to relevant resources and the ticket submission form immediately upon clicking “Contact Support”. You no longer have to navigate through multiple steps to get your problem resolved. Second, we’ve moved to a full-page experience allowing us to curate a selection of support articles and help guides targeting your specific problem, making it easier for you to find answers to your questions. Of course, there will still be times when you need to submit a support ticket, but if we have resources that address your problem, we want you to be able to find that information easily.

All the details you provide when searching for articles in the portal will be captured and added to your ticket if you are not able to find the answers to your questions.

Take advantage of our Support Bot

Our machine learning-powered Support Bot has been integrated into the new portal to deliver a customized experience that identifies your specific problem. Support Bot has been helping our Support Engineers work more efficiently for years, and now we’re making some of this functionality customer-facing so that you can benefit from these efficiencies as well.

Within the portal, the Support Bot will run diagnostics (if your issue is domain-related), assess the issue summary you entered, and provide you with help guides to address the root cause of your problem. The more information you are able to provide, the better our bot can direct you to the resources most pertinent to your issue. This gives you the chance to solve your issue on the spot, rather than waiting for a response to your ticket.

For each issue submitted through the portal, our Support Bot can perform one of two actions. If your issue is domain-specific, the bot will run a set of diagnostics against your domain that check for common configuration issues. If any issue is detected, the bot will display the issue and a suggested solution. Regardless of whether your issue is domain-specific, the bot will also analyze the issue summary you’ve entered against our ensemble of Natural Language Processing models and keyword searches. The bot is trained on thousands of historic customer tickets to differentiate between specific customer issues. We retrain the model on a regular basis to ensure it is consistently learning from new and emerging issues. If the bot detects keywords in your summary that map to a relevant issue, it will present a known solution for that issue.

The solutions the bot surfaces are based on how successfully these resources resolved issues previously, and we will continue to refine the bot’s responses and solutions based on a couple of key success metrics. We consider a recommendation successful if a customer doesn’t need to ultimately open a ticket or if they acknowledge that a resource was helpful by voting on the page. We will evaluate this data along with any information you provide on why specific content wasn’t helpful, and make iterative improvements to the bot every time we retrain it.

Fast and secure ticket submission

While we have a ton of helpful content for a wide range of problems, we know there will be instances where you need to speak to one of our very experienced Support Engineers. For plan types that include ticket support, we have built our ticket submission flow into the portal and introduced new features to make the experience more efficient. The first step for our Support Engineers in resolving most issues is for us to verify the identity of account users and admins. The new process ensures that tickets are only submitted by verified account users and admins, reducing some back and forth and allowing us to start working on your issue right away.

Along with this verification step, the new portal will collect detailed information about your problem up front, including issue category and impact level. These details will help route your ticket to the Support Engineer most knowledgeable in the area of your issue and enable that engineer to begin work on your ticket more quickly and without having to come to you with additional questions.

How to try the new experience

To take advantage of these improvements, we encourage everyone to use the new Support Portal as the starting point for troubleshooting your issues.

Over the next few months, we will be rolling out the new portal to all plan types, starting with an opt-in period where you can pilot the new experience. Once we are satisfied the portal is working as intended, we will close the opt-in phase and release the portal to all customers. At that point, we will begin redirecting emails received at our main support email addresses (support at cloudflare.com and billing at cloudflare.com) to the Support Portal so that they can be triaged, and resolved quicker and more efficiently. We are excited to start implementing these changes and are confident that these steps are the first of many planned in making your support experience as efficient and effective as possible. We can’t wait for you to check it out!

To start using the new portal today, you can opt in from your dashboard. Let us know what you think with the feedback form included at the top of the new portal.

Introducing new Cloudflare for SaaS documentation

Mia Malden — Tue, 09 Aug 2022 13:00:00 GMT

As a SaaS provider, you’re juggling many challenges while building your application, whether it’s custom domain support, protection from attacks, or maintaining an origin server. In 2021, we were proud to announce Cloudflare for SaaS for Everyone, which allows anyone to use Cloudflare to cover those challenges, so they can focus on other aspects of their business. This product has a variety of potential implementations; now, we are excited to announce a new section in our Developer Docs specifically devoted to Cloudflare for SaaS documentation to allow you take full advantage of its product suite.

Cloudflare for SaaS solution

You may remember, from our October 2021 blog post, all the ways that Cloudflare provides solutions for SaaS providers:

Set up an origin server
Encrypt your customers’ traffic
Keep your customers online
Boost the performance of global customers
Support custom domains
Protect against attacks and bots
Scale for growth
Provide insights and analytics

However, we received feedback from customers indicating confusion around actually using the capabilities of Cloudflare for SaaS because there are so many features! With the existing documentation, it wasn’t 100% clear how to enhance security and performance, or how to support custom domains. Now, we want to show customers how to use Cloudflare for SaaS to its full potential by including more product integrations in the docs, as opposed to only focusing on the SSL/TLS piece.

Bridging the gap

Cloudflare for SaaS can be overwhelming with so many possible add-ons and configurations. That’s why the new docs are organized into six main categories, housing a number of new, detailed guides (for example, WAF for SaaS and Regional Services for SaaS):

Once you get your SaaS application up and running with the Get Started page, you can find which configurations are best suited to your needs based on your priorities as a provider. Even if you aren’t sure what your goals are, this setup outlines the possibilities much more clearly through a number of new documents and product guides such as:

Instead of pondering over vague subsection titles, you can peruse with purpose in mind. The advantages and possibilities of Cloudflare for SaaS are highlighted instead of hidden.

Possible configurations

This setup facilitates configurations much more easily to meet your goals as a SaaS provider.

For example, consider performance. Previously, there was no documentation surrounding reduced latency for SaaS providers. Now, the Performance section explains the automatic benefits to your performance by onboarding with Cloudflare for SaaS. Additionally, it offers three options of how to reduce latency even further through brand-new docs:

Similarly, the new organization offers WAF for SaaS as a previously hidden security solution, extending providers the ability to enable automatic protection from vulnerabilities and the flexibility to create custom rules. This is conveniently accompanied by a step-by-step tutorial using Cloudflare Managed Rulesets.

What’s next

While this transition represents an improvement in the Cloudflare for SaaS docs, we’re going to expand its accessibility even more. Some tutorials, such as our Managed Ruleset Tutorial, are already live within the tile. However, more step-by-step guides for Cloudflare for SaaS products and add-ons will further enable our customers to take full advantage of the available product suite. In particular, keep an eye out for expanding documentation around using Workers for Platforms.

Check it out

Visit the new Cloudflare for SaaS tile to see the updates. If you are a SaaS provider interested in extending Cloudflare benefits to your customers through Cloudflare for SaaS, visit our Cloudflare for SaaS overview and our Plans page.

Hertzbleed explained

Yingchen Wang — Tue, 28 Jun 2022 12:57:06 GMT

You may have heard a bit about the Hertzbleed attack that was recently disclosed. Fortunately, one of the student researchers who was part of the team that discovered this vulnerability and developed the attack is spending this summer with Cloudflare Research and can help us understand it better.

The first thing to note is that Hertzbleed is a new type of side-channel attack that relies on changes in CPU frequency. Hertzbleed is a real, and practical, threat to the security of cryptographic software.

Should I be worried?

From the Hertzbleed website,

“If you are an ordinary user and not a cryptography engineer, probably not: you don’t need to apply a patch or change any configurations right now. If you are a cryptography engineer, read on. Also, if you are running a SIKE decapsulation server, make sure to deploy the mitigation described below.”

Notice: As of today, there is no known attack that uses Hertzbleed to target conventional and standardized cryptography, such as the encryption used in Cloudflare products and services. Having said that, let’s get into the details of processor frequency scaling to understand the core of this vulnerability.

In short, the Hertzbleed attack shows that, under certain circumstances, dynamic voltage and frequency scaling (DVFS), a power management scheme of modern x86 processors, depends on the data being processed. This means that on modern processors, the same program can run at different CPU frequencies (and therefore take different wall-clock times). For example, we expect that a CPU takes the same amount of time to perform the following two operations because it uses the same algorithm for both. However, there is an observable time difference between them:

Trivia: Could you guess which operation runs faster?

Before giving the answer we will explain some details about how Hertzbleed works and its impact on SIKE, a new cryptographic algorithm designed to be computationally infeasible for an adversary to break, even for an attacker with a quantum computer.

Frequency Scaling

Suppose a runner is in a long distance race. To optimize the performance, the heart monitors the body all the time. Depending on the input (such as distance or oxygen absorption), it releases the appropriate hormones that will accelerate or slow down the heart rate, and as a result tells the runner to speed up or slow down a little. Just like the heart of a runner, DVFS (dynamic voltage and frequency scaling) is a monitor system for the CPU. It helps the CPU to run at its best under present conditions without being overloaded.

Just as a runner’s heart causes a runner’s pace to fluctuate throughout a race depending on the level of exertion, when a CPU is running a sustained workload, DVFS modifies the CPU’s frequency from the so-called steady-state frequency. DVFS causes it to switch among multiple performance levels (called P-states) and oscillate among them. Modern DVFS gives the hardware almost full control to adjust the P-states it wants to execute in and the duration it stays at any P-state. These modifications are totally opaque to the user, since they are controlled by hardware and the operating system provides limited visibility and control to the end-user.

The ACPI specification defines P0 state as the state the CPU runs at its maximum performance capability. Moving to higher P-states makes the CPU less performant in favor of consuming less energy and power.

Suppose a CPU’s steady-state frequency is 4.0 GHz. Under DVFS, frequency can oscillate between 3.9-4.1 GHz.

How long does the CPU stay at each P-state? Most importantly, how can this even lead to a vulnerability? Excellent questions!

Modern DVFS is designed this way because CPUs have a Thermal Design Point (TDP), indicating the expected power consumption at steady state under a sustained workload. For a typical computer desktop processor, such as a Core i7-8700, the TDP is 65 W.

To continue our human running analogy: a typical person can sprint only short distances, and must run longer distances at a slower pace. When the workload is of short duration, DVFS allows the CPU to enter a high-performance state, called Turbo Boost on Intel processors. In this mode, the CPU can temporarily execute very quickly while consuming much more power than TDP allows. But when running a sustained workload, the CPU average power consumption should stay below TDP to prevent overheating. For example, as illustrated below, suppose the CPU has been free of any task for a while, the CPU runs extra hard (Turbo Boost on) when it just starts running the workload. After a while, it realizes that this workload is not a short one, so it slows down and enters steady-state. How much does it slow down? That depends on the TDP. When entering steady-state, the CPU runs at a certain speed such that its current power consumption is not above TDP.

CPU entering steady state after running at a higher frequency.

Beyond protecting CPUs from overheating, DVFS also wants to maximize the performance. When a runner is in a marathon, she doesn't run at a fixed pace but rather her pace floats up and down a little. Remember the P-state we mentioned above? CPUs oscillate between P-states just like runners adjust their pace slightly over time. P-states are CPU frequency levels with discrete increments of 100 MHz.

CPU frequency levels with discrete increments

The CPU can safely run at a high P-state (low frequency) all the time to stay below TDP, but there might be room between its power consumption and the TDP. To maximize CPU performance, DVFS utilizes this gap by allowing the CPU to oscillate between multiple P-states. The CPU stays at each P-state for only dozens of milliseconds, so that its temporary power consumption might exceed or fall below TDP a little, but its average power consumption is equal to TDP.

To understand this, check out this figure again.

If the CPU only wants to protect itself from overheating, it can run at P-state 3.9 GHz safely. However, DVFS wants to maximize the CPU performance by utilizing all available power allowed by TDP. As a result, the CPU oscillates around the P-state 4.0 GHz. It is never far above or below. When at 4.1 GHz, it overloads itself a little, it then drops to a higher P-state. When at 3.9 GHz, it recovers itself, it quickly climbs to a lower P-state. It may not stay long in any P-state, which avoids overheating when at 4.1 GHz and keeps the average power consumption near the TDP.

This is exactly how modern DVFS monitors your CPU to help it optimize power consumption while working hard.

Again, how can DVFS and TDP lead to a vulnerability? We are almost there!

Frequency Scaling vulnerability

The design of DVFS and TDP can be problematic because CPU power consumption is data-dependent! The Hertzbleed paper gives an explicit leakage model of certain operations identifying two cases.

First, the larger the number of bits set (also known as the Hamming weight) in the operands, the more power an operation takes. The Hamming weight effect is widely observed with no known explanation of its root cause. For example,

The addition on the left will consume more power compared to the one on the right.

Similarly, when registers change their value there are power variations due to transistor switching. For example, a register switching its value from A to B (as shown in the left) requires flipping only one bit because the Hamming distance of A and B is 1. Meanwhile, switching from C to D will consume more energy to perform six bit transitions since the Hamming distance between C and D is 6.

Hamming distance

Now we see where the vulnerability is! When running sustained workloads, CPU overall performance is capped by TDP. Under modern DVFS, it maximizes its performance by oscillating between multiple P-states. At the same time, the CPU power consumption is data-dependent. Inevitably, workloads with different power consumption will lead to different CPU P-state distribution. For example, if workload w1 consumes less power than workload w2, the CPU will stay longer in lower P-state (higher frequency) when running w1.

Different power consumption leads to different P-state distribution

As a result, since the power consumption is data-dependent, it follows that CPU frequency adjustments (the distribution of P-states) and execution time (as 1 Hertz = 1 cycle per second) are data-dependent too.

Consider a program that takes five cycles to finish as depicted in the following figure.

CPU frequency directly translate to running time

As illustrated in the table below, if the program with input 1 runs at 4.0 GHz (red) then it takes 1.25 nanoseconds to finish. If the program consumes more power with input 2, under DVFS, it will run at a lower frequency, 3.5 GHz (blue). It takes more time, 1.43 nanoseconds, to finish. If the program consumes even more power with input 3, under DVFS, it will run at an even lower frequency of 3.0 GHz (purple). Now it takes 1.67 nanoseconds to finish. This program always takes five cycles to finish, but the amount of power it consumes depends on the input. The power influences the CPU frequency, and CPU frequency directly translates to execution time. In the end, the program's execution time becomes data-dependent.

Execution time of a five cycles program
Frequency	4.0 GHz	3.5 GHz	3.0 GHz
Execution Time	1.25 ns	1.43 ns	1.67 ns

To give you another concrete example: Suppose we have a sustained workload Foo. We know that Foo consumes more power with input data 1, and less power with input data 2. As shown on the left in the figure below, if the power consumption of Foo is below the TDP, CPU frequency as well as running time stays the same regardless of the choice of input data. However, as shown in the middle, if we add a background stressor to the CPU, the combined power consumption will exceed TDP. Now we are in trouble. CPU overall performance is monitored by DVFS and capped by TDP. To prevent itself from overheating, it dynamically adjusts its P-state distribution when running workload with various power consumption. P-state distribution of Foo(data 1) will have a slight right shift compared to that of Foo(data 2). As shown on the right, CPU running Foo(data 1) results in a lower overall frequency and longer running time. The observation here is that, if data is a binary secret, an attacker can infer data by simply measuring the running time of Foo!

Complete recap of Hertzbleed. Figure taken from Intel’s documentation.

This observation is astonishing because it conflicts with our expectation of a CPU. We expect a CPU to take the same amount of time computing these two additions.

However, Hertzbleed tells us that just like a person doing math on paper, a CPU not only takes more power to compute more complicated numbers but also spends more time as well! This is not what a CPU should do while performing a secure computation! Because anyone that measures the CPU execution time should not be able to infer the data being computed on.

This takeaway of Hertzbleed creates a significant problem for cryptography implementations because an attacker shouldn’t be able to infer a secret from program’s running time. When developers implement a cryptographic protocol out of mathematical construction, a goal in common is to ensure constant-time execution. That is, code execution does not leak secret information via a timing channel. We have witnessed that timing attacks are practical: notable examples are those shown by Kocher, Brumley-Boneh, Lucky13, and many others. How to properly implement constant-time code is subject of extensive study.

Historically, our understanding of which operations contribute to time variation did not take DVFS into account. The Hertzbleed vulnerability derives from this oversight: any workload which differs by significant power consumption will also differ in timing. Hertzbleed proposes a new perspective on the development of secure programs: any program vulnerable to power analysis becomes potentially vulnerable to timing analysis!

Which cryptographic algorithms are vulnerable to Hertzbleed is unclear. According to the authors, a systematic study of Hertzbleed is left as future work. However, Hertzbleed was exemplified as a vector for attacking SIKE.

Brief description of SIKE

The Supersingular Isogeny Key Encapsulation (SIKE) protocol is a Key Encapsulation Mechanism (KEM) finalist of the NIST Post-Quantum Cryptography competition (currently at Round 3). The building block operation of SIKE is the calculation of isogenies (transformations) between elliptic curves. You can find helpful information about the calculation of isogenies in our previous blog post. In essence, calculating isogenies amounts to evaluating mathematical formulas that take as inputs points on an elliptic curve and produce other different points lying on a different elliptic curve.

SIKE bases its security on the difficulty of computing a relationship between two elliptic curves. On the one hand, it’s easy computing this relation (called an isogeny) if the points that generate such isogeny (called the kernel of the isogeny) are known in advance. On the other hand, it’s difficult to know the isogeny given only two elliptic curves, but without knowledge of the kernel points. An attacker has no advantage if the number of possible kernel points to try is large enough to make the search infeasible (computationally intractable) even with the help of a quantum computer.

Similarly to other algorithms based on elliptic curves, such as ECDSA or ECDH, the core of SIKE is calculating operations over points on elliptic curves. As usual, points are represented by a pair of coordinates (x,y) which fulfill the elliptic curve equation

$ y^2= x^3 + Ax^2 +x $

where A is a parameter identifying different elliptic curves.

For performance reasons, SIKE uses one of the fastest elliptic curve models: the Montgomery curves. The special property that makes these curves fast is that it allows working only with the x-coordinate of points. Hence, one can express the x-coordinate as a fraction x = X / Z, without using the y-coordinate at all. This representation simplifies the calculation of point additions, scalar multiplications, and isogenies between curves. Nonetheless, such simplicity does not come for free, and there is a price to be paid.

The formulas for point operations using Montgomery curves have some edge cases. More technically, a formula is said to be complete if for any valid input a valid output point is produced. Otherwise, a formula is not complete, meaning that there are some exceptional inputs for which it cannot produce a valid output point.

In practice, algorithms working with incomplete formulas must be designed in such a way that edge cases never occur. Otherwise, algorithms could trigger some undesired effects. Let’s take a closer look at what happens in this situation.

A subtle yet relevant property of some incomplete formulas is the nature of the output they produce when operating on points in the exceptional set. Operating with anomalous inputs, the output has both coordinates equal to zero, so X=0 and Z=0. If we recall our basics on fractions, we can figure out that there is something odd in a fraction X/Z = 0/0; furthermore it was always regarded as something not well-defined. This intuition is not wrong, something bad just happened. This fraction does not represent a valid point on the curve. In fact, it is not even a (projective) point.

The domino effect

Exploiting this subtlety of mathematical formulas makes a case for the Hertzbleed side-channel attack. In SIKE, whenever an edge case occurs at some point in the middle of its execution, it produces a domino effect that propagates the zero coordinates to subsequent computations, which means the whole algorithm is stuck on 0. As a result, the computation gets corrupted obtaining a zero at the end, but what is worse is that an attacker can use this domino effect to make guesses on the bits of secret keys.

Trying to guess one bit of the key requires the attacker to be able to trigger an exceptional case exactly at the point in which the bit is used. It looks like the attacker needs to be super lucky to trigger edge cases when it only has control of the input points. Fortunately for the attacker, the internal algorithm used in SIKE has some invariants that can help to hand-craft points in such a way that triggers an exceptional case exactly at the right point. A systematic study of all exceptional points and edge cases was, independently, shown by De Feo et al. as well as in the Hertzbleed article.

With these tools at hand, and using the DVFS side channel, the attacker can now guess bit-by-bit the secret key by passing hand-crafted invalid input points. There are two cases an attacker can observe when the SIKE algorithm uses the secret key:

If the bit of interest is equal to the one before it, no edge cases are present and computation proceeds normally, and the program will take the expected amount of wall-time since all the calculations are performed over random-looking data.
On the other hand, if the bit of interest is different from the one before it, the algorithm will enter the exceptional case, triggering the domino effect for the rest of the computation, and the DVFS will make the program run faster as it automatically changes the CPU’s frequency.

Using this oracle, the attacker can query it, learning bit by bit the secret key used in SIKE.

Ok, let’s recap.

SIKE uses special formulas to speed up operations, but if these formulas are forced to hit certain edge cases then they will fail. Failing due to these edge cases not only corrupts the computation, but also makes the formulas output coordinates with zeros, which in machine representation amount to several registers all loaded with zeros. If the computation continues without noticing the presence of these edge cases, then the processor registers will be stuck on 0 for the rest of the computation. Finally, at the hardware level, some instructions can consume fewer resources if operands are zeroed. Because of that, the DVFS behind CPU power consumption can modify the CPU frequency, which alters the steady-state frequency. The ultimate result is a program that runs faster or slower depending on whether it operates with all zeros versus with random-looking data.

Hertzbleed’s authors contacted Cloudflare Research because they showed a successful attack on CIRCL, our optimized Go cryptographic library that includes SIKE. We worked closely with the authors to find potential mitigations in the early stages of their research. While the embargo of the disclosure was in effect, another research group including De Feo et al. independently described a systematic study of the possible failures of SIKE formulas, including the same attack found by the Hertzbleed team, and pointed to a proper countermeasure. Hertzbleed borrows such a countermeasure.

What countermeasures are available for SIKE?

The immediate action specific for SIKE is to prevent edge cases from occurring in the first place. Most SIKE implementations provide a certain amount of leeway, assuming that inputs will not trigger exceptional cases. This is not a safe assumption. Instead, implementations should be hardened and should validate that inputs and keys are well-formed.

Enforcing a strict validation of untrusted inputs is always the recommended action. For example, a common check on elliptic curve-based algorithms is to validate that inputs correspond to points on the curve and that their coordinates are in the proper range from 0 to p-1 (as described in Section 3.2.2.1 of SEC 1). These checks also apply to SIKE, but they are not sufficient.

What malformed inputs have in common in the case of SIKE is that input points could have arbitrary order—that is, in addition to checking that points must lie on the curve, they must also have a prescribed order, so they are valid. This is akin to small subgroup attacks for the Diffie-Hellman case using finite fields. In SIKE, there are several overlapping groups in the same curve, and input points having incorrect order should be detected.

The countermeasure, originally proposed by Costello, et al., consists of verifying whether the input points are of the right full-order. To do so, we check whether an input point vanishes only when multiplied by its expected order, and not before when multiplied by smaller scalars. By doing so, the hand-crafted invalid points will not pass this validation routine, which prevents edge cases from appearing during the algorithm execution. In practice, we observed around a 5-10% performance overhead on SIKE decapsulation. The ciphertext validation is already available in CIRCL as of version v1.2.0. We strongly recommend updating your projects that depend on CIRCL to this version, so you can make sure that strict validation on SIKE is in place.

Closing comments

Hertzbleed shows that certain workloads can induce changes on the frequency scaling of the processor, making programs run faster or slower. In this setting, small differences on the bit pattern of data result in observable differences on execution time. This puts a spotlight on the state-of-the-art techniques we know so far used to protect against timing attacks, and makes us rethink the measures needed to produce constant-time code and secure implementations. Defending against features like DVFS seems to be something that programmers should start to consider too.

Although SIKE was the victim this time, it is possible that other cryptographic algorithms may expose similar symptoms that can be leveraged by Hertzbleed. An investigation of other targets with this brand-new tool in the attacker’s portfolio remains an open problem.

Hertzbleed allowed us to learn more about how the machines we have in front of us work and how the processor constantly monitors itself, optimizing the performance of the system. Hardware manufacturers have focused on performance of processors by providing many optimizations, however, further study of the security of computations is also needed.

If you are excited about this project, at Cloudflare we are working on raising the bar on the production of code for cryptography. Reach out to us if you are interested in high-assurance tools for developers, and don’t forget our outreach programs whether you are a student, a faculty member, or an independent researcher.

Watch on Cloudflare TV

Internship Experience: Software Development Intern

Ulysses Kee — Wed, 18 May 2022 12:59:16 GMT

Before we dive into my experience interning at Cloudflare, let me quickly introduce myself. I am currently a master’s student at the National University of Singapore (NUS) studying Computer Science. I am passionate about building software that improves people’s lives and making the Internet a better place for everyone. Back in December 2021, I joined Cloudflare as a Software Development Intern on the Partnerships team to help improve the experience that Partners have when using the platform. I was extremely excited about this opportunity and jumped at the prospect of working on serverless technology to build viable tools for our partners and customers. In this blog post, I detail my experience working at Cloudflare and the many highlights of my internship.

Interview Experience

The process began for me back when I was taking a software engineering module at NUS where one of my classmates had shared a job post for an internship at Cloudflare. I had known about Cloudflare’s DNS service prior and was really excited to learn more about the internship opportunity because I really resonated with the company's mission to help build a better Internet.

I knew right away that this would be a great opportunity and submitted my application. Soon after, I heard back from the recruiting team and went through the interview process – the entire interview process was extremely accommodating and is definitely the most enjoyable interview experience I have had. Throughout the process, I was constantly asked about the kind of things I would like to work on and the relevance of the work that I would be doing. I felt that this thorough communication carried on throughout the internship and really was a cornerstone of my experience interning at Cloudflare.

My Internship

My internship began with onboarding and training, and then after, I had discussions with my mentor, Ayush Verma, on the projects we aimed to complete during the internship and the order of objectives. The main issues we wanted to address was the current manual process that our internal teams and partners go through when they want to duplicate the configuration settings on a zone, or when they want to compare one zone to other zones to ensure that there are no misconfigurations. As you can imagine, with the number of different configurations offered on the Cloudflare dashboard for customers, it could take significant time to copy over every setting and rule manually from one zone to another. Additionally, this process, when done manually, poses a potential threat for misconfigurations due to human error. Furthermore, as more and more customers onboard different zones onto Cloudflare, there needs to be a more automated and improved way for them to make these configuration setups.

Initially, we discussed using Terraform as Cloudflare already supports terraform automation. However, this approach would only cater towards customers and users that have more technical resources and, in true Cloudflare spirit, we wanted to keep it simple such that it could be used by any and everyone. Therefore, we decided to leverage the publicly available Cloudflare APIs and create a browser-based application that interacts with these APIs to display configurations and make changes easily from a simple UI.

With the end goal of simplifying the experience for our partners and customers in duplicating zone configurations, we decided to build a Zone Copier web application solely built on Cloudflare Workers. This tool would, in a click of a button, automatically copy over every setting that can be copied from one zone to another, significantly reducing the amount of time and effort required to make the changes.

Alongside the Zone Copier, we would have some auxiliary tools such as a Zone Viewer, and Zone Comparison, where a customer can easily have a full view of their configurations on a single webpage and be able to compare different zones that they use respectively. These other applications improve upon the existing methods through which Cloudflare users can view their zone configurations, and allow for the direct comparison between different zones.

Importantly, these applications are not to replace the Cloudflare Dashboard, but to complement it instead – for deeper dives into a single particular configuration setting, the Cloudflare Dashboard remains the way to go.

To begin building the web application, I spent the first few weeks diving into the publicly available APIs offered by Cloudflare as part of the v4 API to verify the outputs of each endpoint, and the type of data that would be sent as a response from a request. This took much longer than expected as certain endpoints provided different default responses for a zone that has either an empty setting – for example, not having any Firewall Rules created – or uses a nested structure for its relevant response. These different potential responses have to be examined so that when the web application calls the respective API endpoint, the responses are handled appropriately. This process was quite manual as each endpoint had to be verified individually to ensure the output would work seamlessly with the application.

Once I completed my research, I was able to start designing the web application. Building the web application was a very interesting experience as the stack rested solely on Workers, a serverless application platform. My prior experiences building web applications used servers that require the deployment of a server built using Express and Node.js, whereas for my internship project, I completely relied on a backend built using the itty-router library on Workers to interface with the publicly available Cloudflare APIs. I found this extremely exciting as building a serverless application required less overhead compared to setting up a server and deploying it, and using Workers itself has many other added benefits such as zero cold starts. This introduction to serverless technology and my experience deep-diving into the capabilities of Workers has really opened my eyes to the possibilities that Workers as a platform can offer. With Workers, you can deploy any application on Cloudflare’s global network like I did!

For the frontend of the web application, I used React and the Chakra-UI library to build the user interface for which the Zone Viewer, Zone Comparison, and Zone Copier, is based on. The routing between different pages was done using React Router and the application is deployed directly through Workers.

Here is a screenshot of the application:

Presenting the prototype application

As developers will know, the best way to obtain feedback for the tool that you’re building is to directly have your customers use them and let you know what they think of your application and the kind of features they want to have built on top of it. Therefore, once we had a prototype version of the web application for the Zone Viewer and Zone Comparison complete, we presented the application to the Solutions Engineering team to hear their thoughts on the impact the tool would have on their work and additional features they would like to see built on the application. I found this process very enriching as they collectively mentioned how impactful the application would be for their work and the value add this project provides to them.

Some interesting feedback and feature requests I received were:

The Zone Copier would definitely be very useful for our partners who have to replicate the configuration of one zone to another regularly, and it's definitely going to help make sure there are less human errors in the process of configuring the setups.
Besides duplicating configurations from zone-to-zone, could we use this to replicate the configurations from a best-in-class setup for different use cases and allow partners to deploy this with a few clicks?
Can we use this tool to generate quarterly reports?
The Zone Viewer would be very helpful for us when we produce documentation on a particular zone's configuration as part of a POC report.
The Zone Viewer will also give us much deeper insight to better understand the current zone configurations and provide recommendations to improve it.

It was also a very cool experience speaking to the broad Solutions Engineering team as I found that many were very technically inclined and had many valid suggestions for improving the architecture and development of the applications. A special thanks to Edwin Wong for setting up the sharing session with the internal team, and many thanks to Xin Meng, AQ Jiao, Yonggil Choi, Steve Molloy, Kyouhei Hayama, Claire Lim and Jamal Boutkabout for their great insight and suggestions!

Impact of Cloudflare outside of work

While Cloudflare is known for its impeccable transparency throughout the company, and the stellar products it provides in helping make the Internet better, I wanted to take this opportunity to talk about the other endeavors that the company has too.

Cloudflare is part of the Pledge 1%, where the company dedicates 1% of products and 1% of our time to give back to the local communities as well as all the communities we support online around the world.

I took part in one of these activities, where we spent a morning cleaning up parts of the East Coast Park beach, by picking up trash and litter that had been left behind by other park users. Here’s a picture of us from that morning:

From day one, I have been thoroughly impressed by Cloudflare’s commitment to its culture and the effort everyone at Cloudflare puts in to make the company a great place to work and have a positive impact on the surrounding community.

In addition to giving back to the community, other aspects of company culture include having a good team spirit and safe working environment where you feel appreciated and taken care of. At Cloudflare, I have found that everyone is very understanding of work commitments. I faced a few challenges during the internship where I had to spend additional time on university related projects and work, and my manager has always been very supportive and understanding if I required additional time to complete parts of the internship project.

Concluding takeaways

My experience interning at Cloudflare has been extremely positive, and I have seen first hand how transparent the company is with not only its employees but also its customers, and it truly is a great place to work. Cloudflare’s collaborative culture allowed me to access members from different teams, to obtain their thoughts and assistance with certain issues that I faced from time to time. I would not have been able to produce an impactful project without the help of the different brilliant, and motivated, people I worked with across the span of the internship, and I am truly grateful for such a rewarding experience.

We are getting ready to open intern roles for this coming Fall, so we encourage you to visit our careers page frequently, to be up-to-date on all the opportunities we have within our teams.

Kicking Off Cloudflare's Summer 2022 Internship Program

Ellie Jamison — Fri, 15 Oct 2021 15:11:28 GMT

Fall is my favorite season for numerous reasons: the change in temperature, pumpkin spice flavored...everything, and of course, the start of the university recruitment cycle. I am excited to announce Cloudflare has begun hiring for our Summer 2022 internship program. We just opened many of our internship roles on our careers website and will begin reviewing applications on a rolling basis. We are looking for Software Engineer, Product Management, Research, Data Science interns and more. Not only that, but we also have a host of virtual events and tech talks to engage prospective students throughout October and November. Find our event lineup below and RSVP through the attached links by clicking on the event titles.

Session	Date	Time
Inside Look: Hiring Software Engineering Interns and New Grads	October 15, 2021	10:00-10:45 PT
Inside Look: Cloudflare’s Intern Hiring Process	October 19, 2021	11:15-12:00 PT
Inside Look: Nativeflare	October 27, 2021	10:45-11:30 PT
Inside Look: Cloudflare’s Intern Experiences	October 28, 2021	13:00-13:45 PT
Inside Look: Cloudflare’s Culture	November 11, 2021	13:00-13:30 PT

*We have many more events coming up later in the fall and early spring 2022, join our community here for news and updates from us!

In September, Cloudflare kicked off our fall university recruitment efforts by participating in the virtual Grace Hopper Celebration 2021 (#vGHC21) with a team of hiring managers and recruiters from various teams such as Software Engineering, Security, Research, and Product Management. We have sponsored this conference for years, but this was our first experience participating virtually. We met and engaged with 1,000+ attendees through 1:1 conversations and tech talks in our virtual booth. Overall, we had an incredible experience with the virtual Grace Hopper Celebration this year and continue to be committed to hiring and providing opportunities for women at Cloudflare.

This upcoming summer, we expect to have the largest intern class ever at Cloudflare. We have more opportunities on more teams than in any previous summer. To learn more about our Engineering and Product internship opportunities, please check out this video from our hiring managers last fall. Many of our interns have written blog posts about their experiences working at Cloudflare. Check out Meyer Zinn’s blog post from this past summer. Our intern opportunities are not limited to technical roles, we have a host of internships for MBA and J.D. students, as well as those in non-technical majors such as our Communications Internship. We will be opening these opportunities later this fall and early spring 2022.

If you are interested in staying informed about our internships, virtual events and opportunities to engage with Cloudflare, feel free to sign up for email updates or join our community. We will be reviewing all applications on a rolling basis, so if you are waiting to hear back from us, please check out our blog or catch a segment on CloudflareTV to learn more about Cloudflare in the meantime.