
Background
In media streaming, the egress costs of origin servers are typically high due to the large response sizes, and not having proper mechanisms in place can lead to substantial costs. Origin egress traffic is usually significantly more expensive than CDN traffic.
Imagine starting a live event with 10,000 viewers who have a good internet connection. The live stream is going to be 60 minutes long, in 1080p 30FPS. If we assume that every minute of the stream is approximately 130MB in size, then every viewer will consume 7.8GB of data in 60 minutes. Therefore, the total data consumption for all 10,000 viewers would be approximately 78TB.
However, over the past few years, live streaming has become more popular, and numbers are way higher in reality. By multiplying it by the number of contents an OTT platform would stream on average every week and the number of viewers, you can imagine the scale of this cost.
Of course, the real world does not work like this 😃. Streaming platforms use different approaches and system architectures, such as CDN, Origin Shield, Edge Compute, etc., to minimize origin egress costs.

CDN has a major role when it comes to saving origin egress costs, improving user experience, and scalability of the system. At ElevenSports, we have been using various CDN providers based on regions and purposes. Our primary CDN is Fastly because it offers the necessary features like Origin Shield, Global Capacity, Varnish Configuration Language (VCL), etc.
Observation
We have always ensured that our technology expenses are under control and reviewed regularly. Not having good measurements can have a negative impact on our SaaS customers as they pay for what they use. One day, we started looking into ways to significantly reduce our origin egress costs by exploring various solutions and brainstorming with the team. As a result, it could improve the user experience so they don’t have to wait for the CDN to load→cache the content from the origin servers (I’m talking about milliseconds level enhancement). In addition, as a SaaS platform, we are expected to care about our customers both financially and technically. Obviously, missing cache means paying for Origin and CDN traffic but hitting cache means only paying for CDN traffic.
Investigation
Once we made sure there was room for improvement, we started the investigation by collecting and reviewing metrics and some logic that we had. Fastly offers an advanced observability dashboard, and at that time, we used it to retrieve historical and real-time metrics.
Vary Header
Our first suspicion was Vary header value. This header is frequently used incorrectly, which can lead to abysmal hit ratios. It describes the parts of the request message aside from the method and URL that influenced the content of the response it occurs in. Most often, this is used to create a cache key when content negotiation is in use [1]. We reviewed this and found nothing out of it.
Cache Hit Ratio
A cache hit describes the situation where your content is successfully served from the cache and not from original storage (origin server).
The next thing that we realized was the number of cache misses, which was too high. The cache hit ratio was approximately 40–60% on average and wasn’t good enough to protect the origin servers, and basically, 5 out of 10 requests needed to go through the origin server. However, expectations about cache hit ratios may vary by content and audience types. For example, when we have Live and VOD streams, each would have a different request pattern within a time window. Live streams might have a lower cache hit ratio if viewers are distributed worldwide and not limited to a specific region. Having more viewers during a live event can affect the hit ratio percentage. Understanding the pattern is crucial because it helps to set a proper TTL for static assets such as video segments.

Cache TTL
CDNs use TTL to control how long content is cached. For example, a TTL of 3600 seconds would indicate the content should be cached for 1 hour before revalidating. Shorter TTLs make content expire faster. Users customize TTL based on the ability to cache the content. Dynamic content has lower TTL, while static assets use higher TTLs. Using appropriate cache lifetime improves performance.
We had 3600s as cache TTL because we believed that viewers only watch during the live stream, and significantly fewer viewers compared to live will watch an event as VOD (post-live). Therefore, after the event, the traffic on the platform should not be high. Apart from that, one of many reasons for having a lower TTL was that it makes it easier to purge a cache after deleting an event. It was acceptable to keep media segments available an hour after deletion so we don’t need to deal with the complexity of purging caches. However, this assumption wasn’t true as many viewers were watching events as VOD or post-live and in different time windows!
Root cause
After conducting a thorough investigation, we discovered that our caching mechanism was not functioning as we expected, and that caused facing a low cache hit ratio /and/or paying more for origin network egress! All pieces of evidence that we had led us to not having proper cache TTL.
The change!
Not much to say! Based on the clues, we decided to make the change and increase the TTL from 3600s (1h) to 1296000s (15d). We were aware of all the consequences and took care of them properly to make sure our customers and viewers' experiences wouldn’t be affected.

Results
After implementing this change, we immediately noticed a positive impact on the Fastly observability dashboard. It resulted in increasing the cache hit ratio to 85-98%!🔥 and roughly 1 out of 10 requests needed to go through origin servers, which was quite acceptable.

In addition, we experienced far fewer cache misses, which means that CDN did not have to reach the origin servers due to the cache misses.

Conclusion
It is important to have a clear understanding of the request patterns of your audience, the type of content you offer, and how traffic is distributed across the world. This knowledge can help you configure your CDN in the most efficient way possible, allowing you to save a significant amount of money each year by removing load from origin servers. One of the fees that can quickly add up is the network egress fee, which is always present and can be quite expensive. However, with proper configuration, this fee can be significantly reduced.

Fastly’s Origin Shield helped us to go even further and remove more loads from our origin. It’s an interesting feature that every platform should consider based on their use cases. It helps to achieve a scalable platform that can serve content for millions of concurrent viewers.

Thanks for reading! Follow me for more ❤
References
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary