The Cloudflare Blog

Behind the scenes with Stream Live, Cloudflare’s live streaming service

Kyle Boutette — Thu, 02 Jan 2025 14:00:00 GMT

Cloudflare announced Stream Live for open beta in 2021, and in 2022 we went GA. While we talked about the experience of using it and the value it delivers to customers, we didn’t talk about how we built it. So let’s talk about Stream Live’s design, and how it leverages the distributed nature of Cloudflare’s network, rather than centralized locations as many other live services do. Ultimately, our goals are to keep our content ingest as close to broadcasters as possible, our content delivery as close to viewers as possible, and to retain our ability to handle unexpected use cases.

At a high level, Stream Live accepts audio/video content from broadcasters and makes that content available to viewers around the world in real time through the Cloudflare network, which reaches more than 330 cities in over 120 countries. Hence, there are two sides to this: ingesting data from broadcasters and delivering encoded content to viewers. Both sides are built on a combination of internal systems and Cloudflare products, including Cloudflare Workers, Durable Objects, Spectrum, and, of course, Cache.

Let’s start on the ingest side.

Ingesting a broadcast

Broadcasters generate content in real time, as a series of video and audio frames, and it needs to be transmitted to Cloudflare over the Internet. To communicate with us, they choose a protocol such as RTMPS (Real Time Messaging Protocol Secure), SRT (Secure Reliable Transport), or WHIP (WebRTC-HTTP ingestion protocol) that defines how their content is packaged and transmitted. Each of these protocols is a way to transmit audio and video frames with various tradeoffs.

Regardless of the chosen protocol, the ingest lifecycle looks fairly similar. Broadcasters connect to an Anycast IP address using either a custom ingest domain or our default live.cloudflare.com. Both options direct to an IP address advertised from every Cloudflare data center around the world, minimizing the latency for broadcasters (both big and small) to our ingest points.

When the media content arrives at the Cloudflare server servicing the connection, it is first handled by a Spectrum application. Spectrum saved us time by implementing TLS termination for RTMPS, blocking potential DDOS attacks, and a few other protocol-specific benefits, such as our ability to support SRT broadcasters whose clients don't support the Stream ID portion of the SRT protocol. Those broadcasters assume their connection can be fully identified by connecting to a specific port, which was a challenge for our multitenant service. We use Spectrum to expose many listening ports to specific broadcasters which get wrapped up in Simple Proxy Protocol and sent to one ingest service port. This is important, as our SRT implementation spends a non-trivial amount of CPU for each listening port, whereas Spectrum spends effectively nothing. In any case, Spectrum forwards all connections to the ingest service running on the same server.

Our ingest service handles receiving content from broadcasters, forwarding to live outputs, and recording for HLS/DASH viewing. The ingest service is written in Go and acts on configuration and broadcast state stored in our Live Config Durable Objects. One Durable Object instance represents one live input; it contains both the customer configuration and the ongoing state of each broadcast.

We chose Durable Objects over a centralized database since they are not coupled to any specific data center, allowing them to be closer to each of our geographically distributed broadcasters while remaining highly available. In terms of customer configuration, we use Durable Objects to store everything defined when creating or modifying the live input.

We store the canonical state of the broadcast in the Durable Object. That includes timestamp metadata for keeping times monotonic, connection status, and which ingest service instance owns the broadcast. Any content received by the ingest service needs to be acknowledged and indexed by the Durable Object before it is made available to viewers. This splits our state into two types, ‘volatile’ and ‘committed’. Volatile state is content the ingest has received but not yet told the Durable Object about. Committed state is content the Durable Object has acknowledged and is used as a sync point for any other ingest service instance in the event they claim ownership of the broadcast.

This split between volatile and committed states is how we support broadcasters resuming a live broadcast after disconnecting and reconnecting for whatever reason, including a broadcaster network blip. That’s normally a relatively easy problem when you have centralized ingestion and state storage. Since broadcasters connect to an arbitrary server due to Anycast, we needed to get more creative in making sure that whichever server receives the reconnect has the data to continue the broadcast.

The ingest service itself is written as a relay, taking packets from one input stream and mapping them to multiple output streams. At the top level, the relay is implemented as two coupled ‘for’ loops that communicate over a Go channel, one for sending and one for receiving. When we receive data, we internally normalize it depending on the protocol. For example, some protocols send video packets as delimiter-based (annex B h264 NALU), but other protocols or file formats expect length-prefixed packets (avcc h264 NALU).

A special case of relay outputs is our Live Playback and Recording functionality. When this is enabled, we copy and sanitize the incoming packets. Specifically, we issue monotonically increasing timestamps, which solves many issues we’ve encountered with various customer encoders. To ensure everything is aligned, we drop severely misaligned audio/video blocks, something typically seen at the start of broadcasts. Those sanitized packets are packaged into fragmented MP4s on keyframe boundaries. We call those fragmented MP4s ‘original segments.’ Our ingest service stores the original segments in R2 and lets the live config Durable Object know the segment exists and how long it is.

These original segments are considered the canonical copy of the content a customer has uploaded to us, and are reused when transitioning from live to on-demand. Since these are required to serve live viewers, this is why we don’t currently provide an option to decouple live playback from recording. Supporting live playback automatically implements on-demand recording, with some state management overhead.

Viewing the broadcast

At this point, we’ve ingested the content, cleaned it up, and durably stored it. Lets talk about how viewers watch this content.

Most online video is delivered in short 'segments’, multi-second chunks of content. Splitting the output allows content to be progressively emitted and is very cache-friendly. The protocols viewers use to request that content are typically HLS (HTTP Live Streaming) or MPEG-DASH (Dynamic Adaptive Streaming over HTTP). WHEP (WebRTC-HTTP Egress Protocol) is a non-segmented viewing method available for real-time viewing in some cases. However, we’ll focus here on playback using segmented content with HLS or DASH, since that’s most of Stream Live’s usage today.

To start viewing, a video player will request an HLS or DASH playlist which describes the attributes of the media content and acts as an index for each segment. The playlist tells us which segments map to what point of the video’s timeline. Those segments are inserted into a playback buffer to be decoded and displayed by a client’s player. Examples here will use HLS, which is a newline delimited format. The typical alternative is DASH, which is XML based.

First, ‘primary’ playlists describe which renditions of the same content are available, as well as where to get their specific index files. Those renditions can vary in bitrate, codec, framerate, resolution, etc. If you’ve ever picked ‘1080p’ from a video player quality menu, the player knew those qualities were available using this or similar methods. When selecting quality automatically, players choose the best rendition for the viewing machine’s capabilities (such as the ability to decode a certain codec) and network constraints. We use an internal representation of tracks (what kind of content, i.e. video), renditions (content parameters, i.e. 1080p), and muxings (where to find that content, i.e. in R2 bucket N or OTFE with call M) to generate both HLS and DASH, as the two formats contain nearly the same data, except organized differently. Here’s a simplified example of a ‘multi-variant’ or ‘primary’ HLS playlist that Stream generated. It includes some annotations to explain the components.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="group_audio",NAME="original",LANGUAGE="en-0a76e0ad",DEFAULT=YES,AUTOSELECT=YES,URI="stream_2.m3u8" <- Audio track description + URL path
...
#EXT-X-STREAM-INF:RESOLUTION=426x240,CODECS="avc1.42c015,mp4a.40.2",BANDWIDTH=149084,AVERAGE-BANDWIDTH=145135,SCORE=1.0,FRAME-RATE=30.000,AUDIO="group_audio" <- description of variant contents
stream_1.m3u8 <- URL path to fetch variant

Second, ‘variant’ or ‘stream’ playlists contain a list of segments that can be downloaded and viewed for each rendition. These are used for both live and on-demand viewing. The difference is on-demand playlists contain a flag indicating no more content will be added to the index whereas live omits that flag to allow it to append content in future requests. As a result, you’ll see video players download M3U8 (HLS) or MPD (DASH) files approximately every 1–10 seconds when viewing a broadcast, looking for updated content.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-MEDIA-SEQUENCE:89281 <- Indicates position of sliding window
#EXT-X-TARGETDURATION:6
#EXT-X-MAP:URI=".../init.mp4" <- Initialization data
#EXTINF:4.000, <- How long the segment
.../seg_89281.mp4 <- Location of a segment
#EXTINF:4.000,
.../seg_89282.mp4
#EXTINF:4.000,
.../seg_89283.mp4

To serve all requests from viewers, we use a Cloudflare Worker called ‘delivery-worker’. It handles all requests for Stream content, and had its first commit in 2017, making it one of the earliest production-facing workers at Cloudflare. Since it's a worker that executes as close to a viewer as possible, it frees us up to spend more time on content and metadata performance rather than where our logic runs. It delivers content, renders playlists, and performs a variety of other functions. For rendering playlists, the worker transforms the broadcast state from the durable object, as mentioned earlier.

When clients ask for the encoded media content the playlist advertised to them, delivery-worker will send a subrequest to our OTFE (on-the-fly-encoder) service that transits the Cloudflare network. That request describes the format of content requested, i.e. the video stream of segment 89282 at a 1280x720 resolution encoded using AVC (aka H.264) capped CRF with some specified bitrate cap. Then, OTFE will encode the original segment to output the specified configuration.

On-the-fly encoding is more efficient than always-encoding, which is the approach most other platforms take. If there are no viewers watching a specific quality level, or watching the broadcast at all, then we aren’t encoding it. That saves power, CPU time, RAM, network, and cache space. Doing nothing is always more efficient than doing something. This applies to a variety of customer use cases, since not all broadcasts have many viewers widely distributed across a range of connection qualities. For one case, consider when serving live broadcasts that are primarily viewed on high quality connections — encoding 240p or 360p variants would probably go to waste most of the time. For a more extreme case, there are situations where you definitely want recording enabled for live content, but viewing is an exceptional situation, such as for security cameras or dashcams. Of course, we have many customers that have active viewership for their broadcasts and this architecture allows us to serve both use cases.

On-the-fly encoding has a tradeoff: it is hard to implement for “media-correctness” and performance reasons. Media-correctness is important to ensuring smooth playback; individual segments need to have exactly the right start time and duration, or you get stuttering playback, audio/video desync, or entirely unwatchable content. Perfecting our media output requires fine-tuning our encoding, deep-diving into specs, and adjusting fragmented MP4s — especially since most encoders aren’t designed for per-segment encoding. For performance, we hide encoding delay by aggressively prefetching segments from delivery-worker. When a viewer initiates a request for Segment N, we initiate encoding of segment N+1. Since that logic is implemented as a worker, we can also easily add or iterate the prefetching however we want.

This encoding flow stands on top of the Cloudflare network, which also provides us with tiered caching and request coalescing. Request coalescing is the key to supporting many viewers simultaneously but only encoding once by enforcing that, for any number of viewers requesting the same encoded content, only one of those requests will make it to our encoder origin — thanks to Cache.

That’s how Stream Live works at a high level. We ingest content from users, send it to any desired live outputs, transcode it for viewing, and give viewers a choice of quality levels, with a lot of backend complexity hidden behind a friendly API. All of this is built on top of the Cloudflare network, with Cloudflare as Customer Zero for our own products and services, using the same as the ones available to our customers.

There’s a lot more we can write about for problems we’ve solved in building Stream Live over the last few years, but those will have to be for a future blog post. If this mix of media and distributed system problems discussed here sound interesting, the Cloudflare Media Platform is hiring for several roles. Apply and join us!

WebRTC live streaming to unlimited viewers, with sub-second latency

Kyle Boutette — Tue, 27 Sep 2022 13:00:00 GMT

Creators and broadcasters expect to be able to go live from anywhere, on any device. Viewers expect “live” to mean “real-time”. The protocols that power most live streams are unable to meet these growing expectations.

In talking to developers building live streaming into their apps and websites, we’ve heard near universal frustration with the limitations of existing live streaming technologies. Developers in 2022 rightly expect to be able to deliver low latency to viewers, broadcast reliably, and use web standards rather than old protocols that date back to the era of Flash.

Today, we’re excited to announce in open beta that Cloudflare Stream now supports live video streaming over WebRTC, with sub-second latency, to unlimited concurrent viewers. This is a new feature of Cloudflare Stream, and you can start using it right now in the Cloudflare Dashboard — read the docs to get started.

WebRTC with Cloudflare Stream leapfrogs existing tools and protocols, exclusively uses open standards with zero dependency on a specific SDK, and empowers any developer to build both low latency live streaming and playback into their website or app.

The status quo of streaming live video is broken

The status quo of streaming live video has high latency, depends on archaic protocols and is incompatible with the way developers build apps and websites. A reasonable person’s expectations of what the Internet should be able to deliver in 2022 are simply unmet by the dominant set of protocols carried over from past eras.

Viewers increasingly expect “live” to mean “real-time”. People want to place bets on sports broadcasts in real-time, interact and ask questions to presenters in real-time, and never feel behind their friends at a live event.

In practice, the HLS and DASH standards used to deliver video have 10+ seconds of latency. LL-HLS and LL-DASH bring this down to closer to 5 seconds, but only as a hack on top of the existing protocol that delivers segments of video in individual HTTP requests. Sending mini video clips over TCP simply cannot deliver video in real-time. HLS and DASH are here to stay, but aren’t the future of real-time live video.

Creators and broadcasters expect to be able to go live from anywhere, on any device.

In practice, people creating live content are stuck with a limited set of native apps, and can’t go live using RTMP from a web browser. Because it’s built on top of TCP, the RTMP broadcasting protocol struggles under even the slightest network disruption, making it a poor or often unworkable option when broadcasting from mobile networks. RTMP, originally built for use with Adobe Flash Player, was last updated in 2012, and while Stream supports the newer SRT protocol, creators need an option that works natively on the web and can more easily be integrated in native apps.

Developers expect to be able to build using standard APIs that are built into web browsers and native apps.

In practice, RTMP can’t be used from a web browser, and creating a native app that supports RTMP broadcasting typically requires diving into lower-level programming languages like C and Rust. Only those with expertise in both live video protocols and these languages have full access to the tools needed to create novel live streaming client applications.

We’re solving this by using new open WebRTC standards: WHIP and WHEP

WebRTC is the real-time communications protocol, supported across all web browsers, that powers video calling services like Zoom and Google Meet. Since inception it’s been designed for real-time, ultra low-latency communications.

While WebRTC is well established, for most of its history it’s lacked standards for:

Ingestion — how broadcasters should send media content (akin to RTMP today)
Egress — how viewers request and receive media content (akin to DASH or HLS today)

As a result, developers have had to implement this on their own, and client applications on both sides are often tightly coupled to provider-specific implementations. Developers we talk to often express frustration, having sunk months of engineering work into building around a specific vendor’s SDK, unable to switch without a significant rewrite of their client apps.

At Cloudflare, our mission is broader — we’re helping to build a better Internet. Today we’re launching not just a new feature of Cloudflare Stream, but a vote of confidence in new WebRTC standards for both ingestion and egress. We think you should be able to start using Stream without feeling locked into an SDK or implementation specific to Cloudflare, and we’re committed to using open standards whenever possible.

For ingestion, WHIP is an IETF draft on the Standards Track, with many applications already successfully using it in production. For delivery (egress), WHEP is an IETF draft with broad agreement. Combined, they provide a standardized end-to-end way to broadcast one-to-many over WebRTC at scale.

Cloudflare Stream is the first cloud service to let you both broadcast using WHIP and playback using WHEP — no vendor-specific SDK needed. Here’s how it works:

Cloudflare Stream is already built on top of the Cloudflare developer platform, using Workers and Durable Objects running on Cloudflare’s global network, within 50ms of 95% of the world’s Internet-connected population.

Our WebRTC implementation extends this to relay WebRTC video through our network. Broadcasters stream video using WHIP to the point of presence closest to their location, which tells the Durable Object where the live stream can be found. Viewers request streaming video from the point of presence closest to them, which asks the Durable Object where to find the stream, and video is routed through Cloudflare’s network, all with sub-second latency.

Using Durable Objects, we achieve this with zero centralized state. And just like the rest of Cloudflare Stream, you never have to think about regions, both in terms of pricing and product development.

While existing ultra low-latency streaming providers charge significantly more to stream over WebRTC, because Stream runs on Cloudflare’s global network, we’re able to offer WebRTC streaming at the same price as delivering video over HLS or DASH. We don’t think you should be penalized with higher pricing when choosing which technology to rely on to stream live video. Once generally available, WebRTC streaming will cost $1 per 1000 minutes of video delivered, just like the rest of Stream.

What does sub-second latency let you build?

Ultra low latency unlocks interactivity within your website or app, removing the time delay between creators, in-person attendees, and those watching remotely.

Developers we talk to are building everything from live sports betting, to live auctions, to live viewer Q&A and even real-time collaboration in video post-production. Even streams without in-app interactivity can benefit from real-time — no sports fan wants to get a text from their friend at the game that ruins the moment, before they’ve had a chance to watch the final play. Whether you’re bringing an existing app or have a new idea in mind, we’re excited to see what you build.

If you can write JavaScript, you can let your users go live from their browser

While hobbyist and professional creators might take the time to download and learn how to use an application like OBS Studio, most Internet users won’t get past this friction of new tools, and copying RTMP keys from one tool to another. To empower more people to go live, they need to be able to broadcast from within your website or app, just by enabling access to the camera and microphone.

Cloudflare Stream with WebRTC lets you build live streaming into your app as a front-end developer, without any special knowledge of video protocols. And our approach, using the WHIP and WHEP open standards, means you can do this with zero dependencies, with 100% your code that you control.

Go live from a web browser with just a few lines of code

You can go live right now, from your web browser, by creating a live input in the Cloudflare Stream dashboard, and pasting a URL into the example linked below.

Read the docs or run the example code below in your browser using Stackblitz.

import WHIPClient from "./WHIPClient.js";

const url = "";
const videoElement = document.getElementById("input-video");
const client = new WHIPClient(url, videoElement);

This example uses an example WHIP client, written in just 100 lines of Javascript, using APIs that are native to web browsers, with zero dependencies. But because WHIP is an open standard, you can use any WHIP client you choose. Support for WHIP is growing across the video streaming industry — it has recently been added to Gstreamer, and one of the authors of the WHIP specification has written a Javascript client implementation. We intend to support the full WHIP specification, including supporting Trickle ICE for fast NAT traversal.

Play a live stream in a browser, with sub-second latency, no SDK required

Once you’ve started streaming, copy the playback URL from the live input you just created, and paste it into the example linked below.

Read the docs or run the example code below in your browser using Stackbltiz.

import WHEPClient from './WHEPClient.js';
const url = "";
const videoElement = document.getElementById("playback");
const client = new WHEPClient(url, videoElement);

Just like the WHIP example before, this one uses an example WHEP client we’ve written that has zero dependencies. WHEP is an earlier IETF draft than WHIP, published in July of this year, but adoption is moving quickly. People in the community have already written open-source client implementations in both Javascript, C, with more to come.

Start experimenting with real-time live video, in open beta today

WebRTC streaming is in open beta today, ready for you to use as an integrated feature of Cloudflare Stream. Once Generally Available, WebRTC streaming will be priced like the rest of Cloudflare Stream, based on minutes of video delivered and minutes stored.

Read the docs to get started.

Stream Live is now Generally Available

Brendan Irvine-Broque — Wed, 21 Sep 2022 13:15:00 GMT

Today, we’re excited to announce that Stream Live is out of beta, available to everyone, and ready for production traffic at scale. Stream Live is a feature of Cloudflare Stream that allows developers to build live video features in websites and native apps.

Since its beta launch, developers have used Stream to broadcast live concerts from some of the world’s most popular artists directly to fans, build brand-new video creator platforms, operate a global 24/7 live OTT service, and more. While in beta, Stream has ingested millions of minutes of live video and delivered to viewers all over the world.

Bring your big live events, ambitious new video subscription service, or the next mobile video app with millions of users — we’re ready for it.

Streaming live video at scale is hard

Live video uses a massive amount of bandwidth. For example, a one-hour live stream at 1080p at 8Mbps is 3.6GB. At typical cloud provider egress prices, even a little egress can break the bank.

Live video must be encoded on-the-fly, in real-time. People expect to be able to watch live video on their phone, while connected to mobile networks with less bandwidth, higher latency and frequently interrupted connections. To support this, live video must be re-encoded in real-time into multiple resolutions, allowing someone’s phone to drop to a lower resolution and continue playback. This can be complex (Which bitrates? Which codecs? How many?) and costly: running a fleet of virtual machines doesn’t come cheap.

Ingest location matters — Live streaming protocols like RTMPS send video over TCP. If a single packet is dropped or lost, the entire connection is brought to a halt while the packet is found and re-transmitted. This is known as “head of line blocking”. The further away the broadcaster is from the ingest server, the more network hops, and the more likely packets will be dropped or lost, ultimately resulting in latency and buffering for viewers.

Delivery location matters — Live video must be cached and served from points of presence as close to viewers as possible. The longer the network round trips, the more likely videos will buffer or drop to a lower quality.

Broadcasting protocols are in flux — The most widely used protocol for streaming live video, RTMPS, has been abandoned since 2012, and dates back to the era of Flash video in the early 2000s. A new emerging standard, SRT, is not yet supported everywhere. And WebRTC has only recently evolved into an option for high definition one-to-many broadcasting at scale.

The old way to solve this has been to stitch together separate cloud services from different vendors. One vendor provides excellent content delivery, but no encoding. Another provides APIs or hardware to encode, but leaves you to fend for yourself and build your own storage layer. As a developer, you have to learn, manage and write a layer of glue code around the esoteric details of video streaming protocols, codecs, encoding settings and delivery pipelines.

We built Stream Live to make streaming live video easy, like adding an tag to a website. Live video is now a fundamental building block of Internet content, and we think any developer should have the tools to add it to their website or native app.

With Stream, you or your users stream live video directly to Cloudflare, and Cloudflare delivers video directly to your viewers. You never have to manage internal encoding, storage, or delivery systems — it’s just live video in and live video out.

Our network, our hardware = a solution only Cloudflare can provide

We’re not the only ones building APIs for live video — but we are the only ones with our own global network and hardware that we control and optimize for video. That lets us do things that others can’t, like sub-second glass-to-glass latency using RTMPS and SRT playback at scale.

Newer video codecs require specialized hardware encoders, and while others are beholden to the hardware limitations of public cloud providers, we’re already hard at work installing the latest encoding hardware in our own racks, so that you can deliver high resolution video with even less bandwidth. Our goal is to make what is otherwise only available to video giants available directly to you — stay tuned for some exciting updates on this next week.

Most providers limit how many destinations you can restream or simulcast to. Because we operate our own network, we’ve never even worried about this, and let you restream to as many destinations as you need.

Operating our own network lets us price Stream based on minutes of video delivered — unlike others, we don’t pay someone else for bandwidth and then pass along their costs to you at a markup. The status quo of charging for bandwidth or per-GB storage penalizes you for delivering or storing high resolution content. If you ask why a few times, most of the time you’ll discover that others are pushing their own cost structures on to you.

Encoding video is compute-intensive, delivering video is bandwidth intensive, and location matters when ingesting live video. When you use Stream, you don't need to worry about optimizing performance, finding a CDN, and/or tweaking configuration endlessly. Stream takes care of this for you.

Free your live video from the business models of big platforms

Nearly every business uses live video, whether to engage with customers, broadcast events or monetize live content. But few have the specialized engineering resources to deliver live video at scale on their own, and wire together multiple low level cloud services. To date, many of the largest content creators have been forced to depend on a shortlist of social media apps and streaming services to deliver live content at scale.

Unlike the status quo, who force you to put your live video in their apps and services and fit their business models, Stream gives you full control of your live video, on your website or app, on any device, at scale, without pushing your users to someone else’s service.

Free encoding. Free ingestion. Free analytics. Simple per-minute pricing.

	Others	Stream
Encoding	$ per minute	Free
Ingestion	$ per GB	Free
Analytics	Separate product	Free
Live recordings	Minutes or hours later	Instant
Storage	$ per GB	per minute stored
Delivery	$ per GB	per minute delivered

Other platforms charge for ingestion and encoding. Many even force you to consider where you’re streaming to and from, the bitrate and frames per second of your video, and even which of their datacenters you’re using.

With Stream, encoding and ingestion are free. Other platforms charge for delivery based on bandwidth, penalizing you for delivering high quality video to your viewers. If you stream at a high resolution, you pay more.

With Stream, you don’t pay a penalty for delivering high resolution video. Stream’s pricing is simple — minutes of video delivered and stored. Because you pay per minute, not per gigabyte, you can stream at the ideal resolution for your viewers without worrying about bandwidth costs.

Other platforms charge separately for analytics, requiring you to buy another product to get metrics about your live streams.

With Stream, analytics are free. Stream provides an API and Dashboard for both server-side and client-side analytics, that can be queried on a per-video, per-creator, per-country basis, and more. You can use analytics to identify which creators in your app have the most viewed live streams, inform how much to bill your customers for their own usage, identify where content is going viral, and more.

Other platforms tack on live recordings or DVR mode as a separate add-on feature, and recordings only become available minutes or even hours after a live stream ends.

With Stream, live recordings are a built-in feature, made available instantly after a live stream ends. Once a live stream is available, it works just like any other video uploaded to Stream, letting you seamlessly use the same APIs for managing both pre-recorded and live content.

Build live video into your website or app in minutes

Cloudflare Stream enables you or your users to go live using the same protocols and tools that broadcasters big and small use to go live to YouTube or Twitch, but gives you full control over access and presentation of live streams.

Step 1: Create a live input

Create a new live input from the Stream Dashboard or use use the Stream API:

Request

curl -X POST \
-H "Authorization: Bearer " \
-d "{"recording": { "mode": "automatic" } }" \
https://api.cloudflare.com/client/v4/accounts//stream/live_inputs

Response

{
"result": {
"uid": "",
"rtmps": {
"url": "rtmps://live.cloudflare.com:443/live/",
"streamKey": ""
},
...
}
}

Step 2: Use the RTMPS key with any live broadcasting software, or in your own native app

Copy the RTMPS URL and key, and use them with your live streaming application. We recommend using Open Broadcaster Software (OBS) to get started, but any RTMPS or SRT compatible software should be able to interoperate with Stream Live.

Enter the Stream RTMPS URL and the Stream Key from Step 1:

Step 3: Preview your live stream in the Cloudflare Dashboard

In the Stream Dashboard, within seconds of going live, you will see a preview of what your viewers will see, along with the real-time connection status of your live stream.

Step 4: Add live video playback to your website or app

Stream your video using our Stream Player embed code, or use any video player that supports HLS or DASH — live streams can be played in both websites or native iOS and Android apps.

For example, on iOS, all you need to do is provide AVPlayer with the URL to the HLS manifest for your live input, which you can find via the API or in the Stream Dashboard.

import SwiftUI
import AVKit

struct MyView: View {
    // Change the url to the Cloudflare Stream HLS manifest URL
    private let player = AVPlayer(url: URL(string: "https://customer-9cbb9x7nxdw5hb57.cloudflarestream.com/8f92fe7d2c1c0983767649e065e691fc/manifest/video.m3u8")!)

    var body: some View {
        VideoPlayer(player: player)
            .onAppear() {
                player.play()
            }
    }
}

struct MyView_Previews: PreviewProvider {
    static var previews: some View {
        MyView()
    }
}

To run a complete example app in XCode, follow this guide in the Stream Developer docs.

Companies are building whole new video platforms on top of Stream

Developers want control, but most don’t have time to become video experts. And even video experts building innovative new platforms don’t want to manage live streaming infrastructure.

Switcher Studio's whole business is live video -- their iOS app allows creators and businesses to produce their own branded, multi camera live streams. Switcher uses Stream as an essential part of their live streaming infrastructure. In their own words:

“Since 2014, Switcher has helped creators connect to audiences with livestreams. Now, our users create over 100,000 streams per month. As we grew, we needed a scalable content delivery solution. Cloudflare offers secure, fast delivery, and even allowed us to offer new features, like multistreaming. Trusting Cloudflare Stream lets our team focus on the live production tools that make Switcher unique."

While Stream Live has been in beta, we’ve worked with many customers like Switcher, where live video isn’t just one product feature, it is the core of their product. Even as experts in live video, they choose to use Stream, so that they can focus on the unique value they create for their customers, leaving the infrastructure of ingesting, encoding, recording and delivering live video to Cloudflare.

Start building live video into your website or app today

It takes just a few minutes to sign up and start your first live stream, using the Cloudflare Dashboard, with no code required to get started, but APIs for when you’re ready to start building your own live video features. Give it a try — we’re ready for you, no matter the size of your audience.

Making Video Intuitive: An Explainer

Kyle Boutette — Tue, 12 May 2020 11:00:00 GMT

On the Stream team at Cloudflare, we work to provide a great viewing experience while keeping our service affordable. That involves a lot of small tweaks to our video pipeline that can be difficult to discern by most people. And that makes the results of those tweaks less intuitive.

In this post, let's have some fun. Instead of fine-grained optimization work, we’ll do the opposite. Today we’ll make it easy to see changes between different versions of a video: we’ll start with a high-quality video and ruin it. Instead of aiming for perfection, let’s see the impact of various video coding settings. We’ll go on a deep dive on how to make some victim video look gloriously bad and learn on the way.

Everyone agrees that video on the Internet should look good, start playing fast, and never re-buffer regardless of the device they’re on. People can prefer one version of a video over another and say it looks better. Most people, though, would have difficulty elaborating on what ‘better’ means. That’s not an issue when you’re just consuming video. However, when you’re storing, encoding, and distributing it, how that video looks determines how happy your viewers are.

To determine what looks better, video engineers can use a variety of techniques. The most accessible is the most obvious: compare two versions of a video by having people look at them—a subjective comparison. We’ll apply eyeballs here.

So, who’s our sacrificial video? We’re going to use a classic video for the demonstration here—perhaps too classic for people that work with video—Big Buck Bunny. This is an open-source film by Sacha Goedegebure available under the permissive Creative Commons Attribution 3.0 license. We’re only going to work with 17 seconds of it to save some time. This is what the video looks like when downloaded from https://peach.blender.org/download/. Take a moment to savor the quality since we’re only getting worse from here.

For brevity, we'll evaluate our results by two properties: smooth motion and looking ‘crisp’. The video shouldn’t stutter, and its important features should be distinguishable.

It’s worth mentioning that video is a hack of your brain. Every video is just an optimized series of pictures— a very sophisticated flip book. Display those pictures quickly enough, and you can fool the brain into interpreting motion. If you show enough points of light close together, they meld into a continuous image. Then, change the color of those lights frequently enough, and you end up with smooth motion.

Frame rate

Not stuttering is covered by frame rate, measured in frames-per-second (fps). Fps is the number of individual pictures displayed in a single second; many videos are encoded at somewhere between 24 and 30fps. One way to describe fps is in terms of how long a frame is shown for—commonly called the frame time. At 24fps, each frame is shown for about 41 milliseconds. At 2fps, that jumps to 500ms. Lowering fps causes frames to trend rapidly towards persisting for the full second. Smooth motion mostly comes down to the single knob of fps. Mucking about with frame rate isn’t a sporting way to achieve our goal. It’s extremely easy to tank the frame rate and ruin the experience. Humans have a low tolerance for janky motion. To get the idea, here’s what our original clip reduced to 2fps looks like; 500ms per-frame is a long time.

ffmpeg -v info -y -hide_banner -i source.mp4 -r 2 -c:v h264 -c:a copy 2fps.mp4

Resolution

Making tiny features distinguishable has many more knobs. Choices you can make include what codec, level, profile, bitrate, resolution, color space, or key frame frequency, to name a few. Each of these also influences factors apart from perceived quality, such as how large the resulting file is plus what devices it is compatible with. There’s no universal right answer for what parameters to encode a video with. For the best experience while not wasting resources, the same video intended for a modern 4k display should be tailored differently for a 2007 iPod Nano. We’ll spend our time here focusing on what impacts a video’s crispness since that’s what largely determines the experience.

We’re going to use FFmpeg to make this happen. This is the sonic screwdriver of the video world; a near-universal command-line tool for converting and manipulating media. FFmpeg is almost two decades old, has hundreds of contributors, and can do essentially any digital video-related task. Its flexibility also makes it rather complex to work with. For each version of the video, we’ll show the command used to generate it as we go.

Let’s figure out exactly what we want to change about the video to make it a bad experience.

You may have heard about resolution and bitrate. To explain them, let’s use an analogy. Resolution provides pixels. Pixels are buckets for information. Bitrate is the information that fills those buckets. How full a given bucket is determines how well a pixel can represent content. With too few bits of information for a bucket, the pixel will get less and less accurate to the original source. In practice, their numerical relationship is complicated. These are what we’ll be varying.

The decision of which bucket should get how many bits of information is determined by software called a video encoder. The job of the encoder is to use the bits budgeted for it as efficiently as possible to display the best quality video. We’ll be changing the bitrate budget to influence the resulting bitrate. Like people with money, budgeting is a good idea for our encoder. Uncompressed video can use a byte, or more, per-pixel for each of the red, green, and blue(RGB) channels. For a 1080p video, that means 1920x1080 pixels multiplied by 3 bytes to get 6.2MB per frame. We’ll talk about frames later but 6.2 MB is a lot— at this rate, a DVD disc would only fit about 50 seconds of video.

With our variables chosen, we’re good to go. For every variation we encode, we’ll show a comparison to this table. Our source video is encoded in H.264 at 24fps with a variety of other settings, those features will not change. Expect these numbers to get significantly smaller as we poke around to see what changes.

	Resolution	Bitrate	File Size
Source	1280x720	7.5Mbps	16MB

To start, let’s change just resolution and see what impact that has. The lowest resolution most people are exposed to is usually 140p, so let’s re-encode our source video targeting that. Since many video platforms have this as an option, we’re not expecting an unwatchable experience quite yet.

ffmpeg -v info -y -hide_banner -i source.mp4 -vf scale=-2:140 -c:v h264 -b:v 6000k -c:a copy scaled-140.mp4

	Resolution	Bitrate	File Size
Source	1280x720	7.5Mbps	16MB
Scaled to 140p	248x140	2.9Mbps	6.1MB

By the numbers, we find some curious results. We didn’t ask for a different bitrate from the source, but our encoder gave us one that is roughly a third. Given that the number of pixels was dramatically reduced, the encoder had fewer buckets to put the information in our bitrate. Despite its best attempt at using the entire bitrate budget provided to it, our encoder filled all the buckets we provided. What did it do with the leftover information? Since it isn’t in the video, it tossed it.

This would probably be an acceptable experience on a 4in phone screen. You wouldn’t notice the sort-of grainy result on a small display. On a 40in TV, it’d be blocky and unpleasant. At 40in, 140 rows of pixels become individually distinguishable which doesn’t fool the brain and ruins the magic.

Bitrate

Bitrate is the density of information for a given period of time, typically a second. This interacts with frame rate to give us a per frame bitrate budget. Our source having a bitrate of 7.5Mbps (millions of bits-per-second) and frame rate of 24fps means we have an average of 7500Kbps / 24fps = 312.5Kb of information per frame.

Different kinds of frames

There are different ways a frame can be encoded. It doesn’t make sense to use the same technique for a sequence of frames of a single color and most of the sequences in Big Buck Bunny. There’s differing information density and distribution between those sequences. Different ways of representing frames take advantage of those differing patterns. As a result, the 312Kb average for each frame is both lower than the size of the larger frames and greater than the size of the smallest frames. Some frames contain just changes relative to other frames – these are P or B frames – those could be far smaller than 312Kb. However, some frames contain full images – these are I frames – and tend to be far larger than 312Kb. Since we’re viewing the video holistically as multiple seconds, we don’t need to worry about them since we’re concerned with the overall experience. Knowing about frames is useful for their impact on bitrate for different types of content, which we’ll discuss later.

Our starting bitrate is extremely large and has more information than we actually need. Let’s be aggressive and cut it down to 1/75th while maintaining the source’s resolution.

ffmpeg -v info -y -hide_banner -i source.mp4 -c:v h264 -b:v 100k -c:a copy bitrate-100k.mp4

	Resolution	Bitrate	File Size
Source	1280x720	7.5Mbps	16MB
Scaled to 140p	248x140	2.9Mbps	6.1MB
Targeted to 100Kbps	1280x720	102Kbps	217KB

When you take a look at the video, fur and grass become blobs. There’s just not enough information to accurately represent the fine details.

Source Video

100 Kbps budget

We provided a bitrate budget of 100Kbps, but the encoder doesn’t seem to have quite hit it. When we changed the resolution, we had a lower bitrate than we asked for, here we have a higher bitrate. Why would that be the case?

We have so many buckets that there’s some minimum amount the encoder wants in each. Since it can play with the bitrate, it ends up favoring slightly more full buckets since that’s easier. This is somewhat the reverse of why our previous experiment had a lower bitrate than expected.

We can influence how the encoder budgets bitrate using rate control modes. We’re going to stick with the default ‘Average-Bitrate’ mode to keep things easy. This mode is suboptimal since it lets the encoder spend a bunch of budget up front to its detriment later. However, it's easy to reason about.

Resolution + Bitrate

Targeting a bitrate of 100Kbps got us an unpleasant video but not something completely unwatchable. We haven’t quite ruined our video yet. We might as well take bitrate down to an even further extreme of 20Kbps while keeping the resolution constant.

ffmpeg -v info -y -hide_banner -i source.mp4 -c:v h264 -b:v 20k -c:a copy bitrate-20k.mp4

	Resolution	Bitrate	File Size
Source	1280x720	7.5Mbps	16MB
Scaled to 140p	248x140	2.9Mbps	6.1MB
Targeted to 100Kbps	1280x720	102Kbps	217KB
Targeted to 20Kbps	1280x720	35Kbps	81KB

Now, this is truly unwatchable! There’s sometimes color, but the video mostly devolves into grayscale rectangles roughly approximating the silhouettes of what we’re expecting. At slightly less than a third the bitrate of the previous trial, this definitely looks like it has less than a third of the information.

As before, we didn’t hit our bitrate target and for the same reason that our pixel buckets were insufficiently filled with information. The encoder needed to start making hard decisions at some point between 102 and 35Kbps. Most of the color and the comprehensibility of the scene were sacrificed.

We’ll discuss why there’s moving grayscale rectangles and patches of color in a bit. They’re giving us a hint about how the encoder works under the hood.

What if we go just one step further and combine our tiny resolution with the absurdly low bitrate? That should be an even worse experience, right?

ffmpeg -v info -y -hide_banner -i source.mp4 -vf scale=-2:140 -c:v h264 -b:v 20k -c:a copy scaled-140_bitrate-20k.mp4

	Resolution	Bitrate	File Size
Source	1280x720	7.5Mbps	16MB
Scaled to 140p	248x140	2.9Mbps	6.1MB
Targeted to 100Kbps	1280x720	102Kbps	217KB
Targeted to 20Kbps	1280x720	35Kbps	81KB
Scaled to 140p and Targeted to 20Kbps	248x140	19Kbps	48KB

Wait a minute, that’s actually not too bad at all. It’s almost like a tinier version of 1280 by 720 at 100Kbps. Why doesn’t this look terrible? Having a lower bitrate means there’s less information, which implies that the video should look worse. A lower resolution means the image should be less detailed. The numbers got smaller, so the video shouldn’t look better!

Thinking back to buckets and information, we now have less information but fewer discrete places for that information to live. This specific combination of low bitrate and low resolution means the buckets are nicely filled. The encoder exactly hit our target bitrate which is a reasonable indicator that it was at least somewhat satisfied with the final result.

This isn’t going to be a fun experience on a 4k display, but it is fine enough for an iPod Nano from 2007. A 3rd generation iPod Nano has a 320x240 display spread across a 2in screen. Our 140p video will be nearly indistinguishable from a much higher quality video. Even more, 48KB for 17 seconds of video makes fantastic use of the limited storage – 4GB on some models. In a resource-constrained environment, this low video quality can be a large quality of experience improvement.

CC BY 2.0 - image by nez

We should have a decent intuition for the relationship between bitrate and resolution plus what the tradeoffs are. There’s a lingering question, though, do we need to make tradeoffs? There has to be some ratio of bitrate to pixel-count in order to get the best quality for a given resolution at a minimal file size.

In fact, there are such perfect ratios. In ruining the video, we ended up testing a few candidates of this ratio for our source video.

	Resolution	Bitrate	File Size	Bits/Pixel
Source	1280x720	7.5Mbps	16MB	8.10
Scaled to 140p	248x140	2.9Mbps	6.1MB	83.5
Targeted to 100Kbps	1280x720	102Kbps	217KB	0.11
Targeted to 20Kbps	1280x720	35Kbps	81KB	0.03
Scaled to 140p and Targeted to 20Kbps	248x140	19Kbps	48KB	0.55

However, there are some complications.

The biggest caveat is that the optimal ratio depends on your source video. Each video has a different amount of information required to be displayed. There are a couple of reasons for that.

If a frame has many details then it takes more information to represent. Frames in chronological order that visually differ significantly (think of an action movie) take more information than a set of visually similar frames (like a security camera outside a quiet warehouse). The former can’t use as many B or P frames which occupy less space. Animated content with flat colors require encoders to make fewer trade-offs that cause visual degradation than live-action.

Thinking back to the settings that resulted in grayscale rectangles and patches of color, we can learn a bit more. We saw that the rectangles and color seem to move, as though the encoder was playing a shell game with tiny boxes of pictures.

What is happening is that the encoder is recognizing repeated patterns within and between frames. Then, it can reference those patterns to move them around without needing to actually duplicate them. The P and B frames mentioned earlier are mainly composed of these shifted patterns. This is similar, at least in spirit, to other compression algorithms that use dictionaries to refer to previous content. In most video codecs, the bits of picture that can be shifted are called ‘macroblocks’, which subdivide each frame with NxN squares of pixels. The less stingy the bitrate, the less obvious the macroblock shell game.

To see this effect more clearly, we can ask FFmpeg to show us decisions it makes. Specifically, it can show us what it decides is ‘motion’ moving the macroblocks. The video here is 140p for the motion vector arrows to be easier to see.

ffmpeg -v info -y -hide_banner -flags2 +export_mvs -i source.mp4 -vf scale=-2:140,codecview=mv=pf+bf+bb -c:v h264 -b:v 6000k -c:a copy motion-vector.mp4

Even worse is that flat color and noise might only be seen in two different scenes in the same video. That forces you to either waste your bitrate budget in one scene or look terrible in the other. We give the encoder a bitrate budget it can use. How it uses it is the result of a feedback loop during encoding.

Yet another caveat is that you're resulting bitrate is influenced by all those knobs that were listed earlier, the most impactful being codec choice followed by bitrate budget. We explored the relationship between bitrate and resolution, but every knob has an impact on the quality and a single knob frequently interacts with other knobs.

So far we’ve taken a look at some of the knobs and settings that affect visual quality in a video. Every day, video engineers and encoders make tough decisions to optimize for the human eye, while keeping file sizes at a minimum. Modern encoding schemes use techniques such as per title encoding to narrow down the best resolution-bitrate combinations. Those schemes look somewhat similar to what we’ve done here: test various settings and see what gives the desired result.

With every example, we’ve included an FFmpeg command you can use to replicate the output above and experiment with your own videos. We encourage you to try improving the video quality while reducing file sizes on your own and to find other levers that will help you on this journey!