公司:Netflix
Netflix(/ˈnɛtflɪks/)(官方中文译名网飞,非官方中文译名奈飞)是起源于美国、在世界各地提供网络视频点播的OTT服务公司,并同时在美国经营单一费率邮寄影像光盘出租服务,后者是使用回邮信封寄送DVD和Blu-ray出租光盘至消费者指定的收件地址。公司由里德·哈斯廷斯和马克·兰多夫在1997年8月29日成立,总部位于加利福尼亚州的洛斯盖图,1999年开始推出订阅制的服务。2009年,Netflix已可提供超过10万部电影DVD,订阅者数超过1000万人。另一方面,截至2022年6月的数据,Netflix的流服务已经在全球拥有2.20亿个订阅用户,在美国的订户已达到7330万。其主要的竞争对手有Disney+、Hulu、HBO Max、Amazon Prime Video、YouTube Premium及Apple TV+等。
Netflix在多个排行榜上均榜上有名:2017年6月6日,《2017年BrandZ最具价值全球品牌100强》公布,Netflix名列第92位。2018年10月,《财富》未来公司50强排行榜发布,Netflix排名第八。2018年12月,世界品牌实验室编制的《2018世界品牌500强》揭晓,排名第88。在《财富》2018年世界500大排名261位,并连年增长。2019年10月,位列2019福布斯全球数字经济100强榜第46名。2019年10月,Interbrand发布的全球品牌百强榜排名65。2020年1月22日,名列2020年《财富》全球最受赞赏公司榜单第16位。2022年2月,按市值计算,Netflix为全球第二大的媒体娱乐公司。2019年,Netflix加入美国电影协会(MPA)。另外,Netflix也被部分媒体列为科技巨擘之一。
Scaling Media Machine Learning at Netflix
We tackle some of the unique challenges of scaling multimodal machine learning models that operate on media assets (video, audio, and text).
Discovering Creative Insights in Promotional Artwork
When members are shown a title on Netflix, the displayed artwork, trailers, and synopses are personalized. That means members see the assets that are most likely to help them make an informed choice. These assets are a critical source of information for the member to make a decision to watch, or not watch, a title. The stories on Netflix are multidimensional and there are many ways that a single story could appeal to different members. We want to show members the images, trailers, and synopses that are most helpful to them for making a watch decision.
Scalable Annotation Service — Marken
In Marken (Scalable Annotation Service at Netflix), an annotation is a piece of metadata which can be attached to an object from any domain.
Ready-to-go sample data pipelines with Dataflow
This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix.
You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow. That article was a deep dive into one of the more technical aspects of Dataflow and didn’t properly introduce this tool in the first place. This time we’ll try to give justice to the intro and then we will focus on one of the very first features Dataflow came with. That feature is called sample workflows, but before we start in let’s have a quick look at Dataflow in general.
For your eyes only: improving Netflix video quality with neural networks
When you are binge-watching the latest season of Stranger Things or Ozark, we strive to deliver the best possible video quality to your eyes. To do so, we continuously push the boundaries of streaming video quality and leverage the best video technologies. For example, we invest in next-generation, royalty-free codecs and sophisticated video encoding optimizations. Recently, we added another powerful tool to our arsenal: neural networks for video downscaling. In this tech blog, we describe how we improved Netflix video quality with neural networks, the challenges we faced and what lies ahead.
Match Cutting at Netflix: Finding Cuts with Smooth Visual Transitions
In film, a match cut is a transition between two shots that uses similar visual framing, composition, or action to fluidly bring the viewer from one scene to the next. It is a powerful visual storytelling tool used to create a connection between two scenes.
Embracing the Differences : Inside the Netflix API Redesign
The key driver for this redesigned API is the fact that there are a range of differences across the 800+ device types that we support. Most APIs (including the REST API that Netflix has been using since 2008) treat these devices the same, in a generic way, to make the server-side implementations more efficient. And there is good reason for this approach. Providing an OSFA API allows the API team to maintain a solid contract with a wide range of API consumers because the API team is setting the rules for everyone to follow.
While effective, the problem with the OSFA approach is that its emphasis is to make it convenient for the API provider, not the API consumer. Accordingly, OSFA is ignoring the differences of these devices; the differences that allow us to more optimally take advantage of the rich features offered on each.
Seeing through hardware counters: a journey to threefold performance increase
In one of our previous blogposts, A Microscope on Microservices we outlined three broad domains of observability (or “levels of magnification,” as we referred to them) — Fleet-wide, Microservice and Instance. We described the tools and techniques we use to gain insight within each domain. There is, however, a class of problems that requires an even stronger level of magnification going deeper down the stack to introspect CPU microarchitecture. In this blogpost we describe one such problem and the tools we used to solve it.
Consistent caching mechanism in Titus Gateway
Titus is the Netflix cloud container runtime that runs and manages containers at scale. In the time since it was first presented as an advanced Mesos framework, Titus has transparently evolved from being built on top of Mesos to Kubernetes, handling an ever-increasing volume of containers. As the number of Titus users increased over the years, the load and pressure on the system increased substantially. The original assumptions and architectural choices were no longer viable. This blog post presents how our current iteration of Titus deals with high API call volumes by scaling out horizontally.
We introduce a caching mechanism in the API gateway layer, allowing us to offload processing from singleton leader elected controllers without giving up strict data consistency and guarantees clients observe. Titus API clients always see the latest (not stale) version of the data regardless of which gateway node serves their request, and in which order.
Orchestrating Data/ML Workflows at Scale With Netflix Maestro
At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations. A large number of batch workflows run daily to serve various business needs. These include ETL pipelines, ML model training workflows, batch jobs, etc. As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company.
In this blog post, we introduce and share learnings on Maestro, a workflow orchestrator that can schedule and manage workflows at a massive scale.
How Product Teams Can Build Empathy Through Experimentation
A conversation between Travis Brooks, Netflix Product Manager for Experimentation Platform, and George Khachatryan, OfferFit CEO.
Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads
Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of our media encoding platform, Cosmos. Over the past 2.5 years, its usage has increased, and Timestone is now also the priority queueing engine backing our general-purpose workflow orchestration engine (Conductor), and the scheduler for large-scale data pipelines (BDP Scheduler). All in all, millions of critical workflows within Netflix now flow through Timestone on a daily basis.
Timestone clients can create queues, enqueue messages with user-defined deadlines and metadata, then dequeue these messages in an earliest-deadline-first (EDF) fashion. Filtering for EDF messages with criteria (e.g. “messages that belong to queue X and have metadata Y”) is also supported.
One of the things that make Timestone different from other priority queues is its support for a construct we call exclusive queues — this is a means to mark chunks of work as non-parallelizable, without requiring any locking or coordination on the consumer side; everything is taken care of by the exclusive queue in the background. We explain the concept in detail in the sections that follow.
Virtual Production — A Validation Framework For Unreal Engine
The use of Virtual Production and real time technologies has markedly accelerated in the past few years. At Netflix, we are always thrilled to see technology enable new ways of telling stories, and the use of these techniques on some of our shows like 1899 and Super Giant Robot Brothers has given us a front row seat to this exciting evolution in filmmaking. Each production that deploys these methods is an opportunity for the crew, tech manufacturers and us–the Netflix Production Innovation team–to learn, innovate and collaborate towards a common goal: universally accessible workflows that will enable creative opportunities and technical success for all filmmakers regardless of the size, location or scope of their project.
Data Mesh — A Data Movement and Processing Platform @ Netflix
Realtime processing technologies (A.K.A stream processing) is one of the key factors that enable Netflix to maintain its leading position in the competition of entertaining our users. Our previous generation of streaming pipeline solution Keystone has a proven track record of serving multiple of our key business needs. However, as we expand our offerings and try out new ideas, there’s a growing need to unlock other emerging use cases that were not yet covered by Keystone. After evaluating the options, the team has decided to create Data Mesh as our next generation data pipeline solution.
Last year we wrote a blog post about how Data Mesh helped our Studio team enable data movement use cases. A year has passed, Data Mesh has reached its first major milestone and its scope keeps increasing. As a growing number of use cases on board to it, we have a lot more to share. We will deliver a series of articles that cover different aspects of Data Mesh and what we have learned from our journey. This article gives an overview of the system. The following ones will dive deeper into different aspects of it.
Formulating ‘Out of Memory Kill’ Prediction on the Netflix App as a Machine Learning Problem
The purpose of this article is to give insights into analyzing and predicting “out of memory” or OOM kills on the Netflix App. Unlike strong compute devices, TVs and set top boxes usually have stronger memory constraints. More importantly, the low resource availability or “out of memory” scenario is one of the common reasons for crashes/kills. We at Netflix, as a streaming service running on millions of devices, have a tremendous amount of data about device capabilities/characteristics and runtime data in our big data platform. With large data, comes the opportunity to leverage the data for predictive and classification based analysis. Specifically, if we are able to predict or analyze the Out of Memory kills, we can take device specific actions to pre-emptively lower the performance in favor of not crashing — aiming to give the user the ultimate Netflix Experience within the “performance vs pre-emptive action” tradeoff limitations. A major advantage of prediction and taking pre-emptive action, is the fact that we can take actions to better the user experience.
This is done by first elaborating on the dataset curation stage — specially focussing on device capabilities and OOM kill related memory readings. We also highlight steps and guidelines for exploratory analysis and prediction to understand Out of Memory kills on a sample set of devices. Since memory management is not something one usually associates with classification problems, this blog focuses on formulating the problem as an ML problem and the data engineering that goes along with it. We also explore graphical analysis of the labeled dataset and suggest some feature engineering and accuracy measures for future exploration.
How Netflix Content Engineering makes a federated graph searchable (Part 2)
In a previous post, we described the indexing architecture of Studio Search and how we scaled the architecture by building a config-driven self-service platform that allowed teams in Content Engineering to spin up search indices easily.
This post will discuss how Studio Search supports querying the data available in these indices.