Going FaaSter, Functions as a Service at Netflix
如果无法正常显示,请先停止浏览器的去广告插件。
1. 130 million customers
in over 190 countries
streaming 140 million hrs/day
2.
3.
4. 4
5.
6. We use a data driven approach via A/B testing for most changes to our
product — ensuring every change delights our customers
source: https://www.optimizely.com/optimization-glossary/ab-testing/
7. 1000s of A/B tests a year
8.
9.
10.
11. Netflix API
http://api.netflix.com
12. The Netflix API decouples clients from the backend services, providing a
integration point for both services and clients
Clients
Client API
Edge API
Backend Services
TV
Search
iOS
MAP
Android
Remote
Service
Layer
GPS
…
Windows
Playback
Browsers
13. The Netflix API uses the BFF (backend for frontend) pattern, where the
BFF is tightly coupled to each device — making it easier to define and
adapt the UI, and streamlining releases
Clients
Client API
Edge API
Backend Services
BFF
TV
Search
iOS
MAP
Android
Remote
Service
Layer
GPS
…
Windows
Playback
Browsers
14. These BFFs are maintained by the UI teams, since it’s tightly coupled to
their UI
15. Netflix API requirements
Velocity
Ergonomic
Reliability
No Operations
16. Going FaaSter: Function as a Service at
Netflix
@
Yunong Xiao,
Principal Software Engineer, Netflix
17. FaaS Evolution
Pre-Cloud
On Prem
λ
Application
Services
Platform
You Manage
IaaS PaaS
FaaS
λ λ λ
Application Application Application
Services
Platform Services
Platform Services
Platform
Others Manage
18. Build or buy?
Pros Cons
No-ops Homogenous architecture
Accessible monitoring & debugging
Velocity Netflix stack integration
Reliable service platform Limits: latency, memory,
execution time
19. We’ll cover:
Runtime platform
architecture
Developer
experience
Management &
operations
20. Pre-Cloud
On Prem
λ
Application
Services
Platform
You Manage
IaaS PaaS
FaaS
λ λ λ
Application Application Application
Services
Platform Services
Platform Services
Platform
Others Manage
21. We are almost completely hosted in the cloud using AWS
22. EC2 makes up the foundation of infrastructure at Netflix
23. VMs or Containers?
24. We chose to use containers as the foundation of our FaaS platform, as it
gave us advantages which let us build a platform that is ergonomic,
efficient, with high deployment velocity
Lightweight & Fast
Deployments
Portability across
environments
Efficient bin packing
25. We built Titus — our own container management platform — capable of
launching millions of containers a day
26. Pre-Cloud
On Prem
λ
Application
Services
Platform
You Manage
IaaS PaaS
FaaS
λ λ λ
Application Application Application
Services
Platform Services
Platform Services
Platform
Others Manage
27. We have created a reliable, open source services platform
28. We have created a reliable, open source services platform
Service Discovery: Eureka https://github.com/Netflix/eureka
RPC: Ribbon (HTTP), gRPC https://github.com/Netflix/ribbon
Configuration: Archaius https://github.com/Netflix/archaius
Metrics: Atlas https://github.com/Netflix/atlas
Fault tolerance: Hystrix https://github.com/Netflix/hystrix
External LB: Zuul https://github.com/Netflix/zuul
Tracing: Mantis, Salp
…
29. Assembling these components yourself is time consuming, difficult, and
error prone
30. Assembling these components yourself is time consuming, difficult, and
error prone
31. You always have to keep components updated to the latest versions
yourself
32. You have to ensure that metrics and dashboards are created for your
service
33. You’re on the hook for managing and operating the infrastructure
34. You shouldn’t have to set everything up from scratch every time when all
you care about is the business logic
34
35. Pre-Cloud
On Prem
λ
Application
Services
Platform
You Manage
IaaS PaaS
FaaS
λ λ λ
Application Application Application
Services
Platform Services
Platform Services
Platform
Others Manage
36. We set out to build our runtime FaaS platform that solves these issues
No assembly required
Automatic updates
Observable metrics
Managed operations
36
37. The platform is a services container that has been pre-assembled with all
of the components needed for a production ready service
Service
Registration
Metrics Service
Discovery
Daemon
Stream
Processing Metrics
Daemon
Configuration Log rotation
Server
Auth
Throttling
RPC
Clients
38. All that’s needed is for customers to insert their business logic
Service
Registration
Metrics Service
Discovery
Daemon
Stream
Processing Metrics
Daemon
Configuration Log rotation
Server
Auth
Throttling
Route /foo
Route /bar
…
RPC
Clients
39. We package and version the platform as a single entity, and can easily
upgrade and test the components once and ensure everyone receives the
upgrade
40. We control the runtime, the platform can emit a consistent set of
application, RPC, and systems metrics for every function
Service
Registration
Metrics Service
Discovery
Daemon
Stream
Processing Metrics
Daemon
Configuration Log rotation
Server
Auth
Throttling
Route /foo
Route /bar
…
RPC
Clients
41. We set out to build our runtime FaaS platform that solves these issues
No assembly required
Automatic updates
Observable metrics
Managed operations
41
42. We’ll cover:
Runtime platform
architecture
Developer
experience
Management &
operations
43.
44. Functions are managed via a configuration API, where most fields are
optional.
{
"service": {
"org": "iosui",
"name": "iphone"
},
"platformVersion": "^6.0.0",
"routes": {
"routes": {
"movies": {
"get": {
"source": “./lib/endpoints/movies.js"
}
},
"profile": {
"post": {
"source": “./lib/endpoints/profile.js”
}
}
}
},
"sources": ["./lib"],
"propertiesPath": "./etc",
"startupHooks": [
"./hooks/startupHook.js"
]
Service name
FaaS platform version
Function declarations
Additional source code
Configuration
Lifecycle
management
45. Business logic can be implemented using the popular Node.js “Connect”
style middleware which handles requests.
HTTP Request object
HTTP Response
callback
module.exports = function(req, res, next) {
res.send(200, req.query);
};
return next();
46. Platform components such as metrics, loggers, or RPC clients are
available via the “req” object — providing a full runtime API for
developers
module.exports = function ping(req, res, next) {
req.log.info('Hello World!');
req.getRequestContext(); // request context
req.getAtlas(); // metrics client
req.getDNAClient(); // RPC client
req.getProperties(); // Configuration Client
req.getEdgar(); // Tracing
req.getMantis(); // Stream processing client
req.getGeo(); // Geo location
req.getPassport(); // Auth
};
return next();
47. Long lived third party libraries can be managed via startup and shutdown
lifecycle hooks.
"startupHooks": [
"./hooks/startupHook.js"
],
"shutdownHooks": [
"./hooks/shutdownHook.js"
]
48. Hooks are initiated before the platform starts, have access to all platform
components, and allow for third party libraries to be made available on
the request object
// executed before platform starts
module.exports = function startuphook(opts, cb) {
// access to all platform components
opts.atlas;
opts.infrastructureInfo;
opts.log;
...
opts.properties;
opts.serviceInfo;
};
// return an object that will be made available
// to all functions
return cb(null, { foo: 'bar' });
49. External dependencies can be imported from
50. Our goal is to create a local function development experience that
improves the software development life cycle for developers
51. We created a developer workflow tool called NEWT (Netflix Workflow
Toolkit) which simplifies and facilitates common developer tasks
Development
Debugging
Testing
Publishing
Deployment
52.
53. One-click setup for a consistent development environment. Installs
dependencies and keeps them updated
54. We created a development FaaS platform for local development —
enabling engineers to interactively test functions in seconds —
reducing friction and increasing velocity
Dev FaaS platform
live reload
Service
Registration
Service
Discovery
Daemon
Server
Metrics
Auth
Stream Processing
Metrics Daemon
Throttling
Configuration
local functions
RPC
Clients
Log rotation
55. Local debugging further increases velocity and reduces friction of the
SDLC
Dev FaaS platform
Service
Registration
Serve
Metrics
Service
Discovery
Daemon
Auth
Stream
Processing
Metrics Daemon
Throttling
local testing
Configuration
Log rotation
RPC
Clients
Attach debugger
Logs
56. The local FaaS platform can be integrated and routed within the Netflix
cloud, enabling seamless end to end testing
Servi
Auth
Throt
S
Metri
Servi
Strea Metri
Confi Log
RPC
Device
Zuul: Auth, SSL, …
Backend services
Local functions
57. Teams also want to test functions in isolation without having to connect
to or depend on upstream and downstream services
Servi
Auth
Throt
S
Metri
Servi
Strea Metri
Confi Log
RPC
Device
Zuul: Auth, SSL, …
Isolated Local functions
local functions
Backend services
58. The FaaS platform provides mocks and unit test APIs which allows teams
to test functions in isolation without having to connect to or depend on
upstream and downstream services
module.exports = function ping(req, res, next) {
req.log.info('Hello World!');
req.getRequestContext(); // request context
req.getAtlas(); // metrics client
req.getDNAClient(); // RPC client
req.getProperties(); // Configuration Client
req.getEdgar(); // Tracing
req.getMantis(); // Stream processing client
req.getGeo(); // Geo location
req.getPassport(); // Auth
};
return next();
Runtime API requires downstream services to be available
59. The FaaS platform provides mocks and unit test APIs which allows teams
to test functions in isolation without having to connect to or depend on
upstream and downstream services
// Unit test
it('should create all mocks', function(done) {
mocks.create(function(err, allMocks) {
assert.isObject(allMocks);
assert.isObject(allMocks.log);
assert.isObject(allMocks.properties);
...
assert.isObject(allMocks.req);
assert.isObject(allMocks.res);
return done();
});
});
Mocks are available from the unit test API
60. This development platform can also be easily deployed to Jenkins using
NEWT, unlocking CI/CD tests for both the FaaS platform and functions
themselves
61. We’ll cover:
Runtime platform
architecture
Developer
experience
Management &
operations
62. Publish
Deploy
Operate
63. Functions are published using our NEWT tool, and are immutably
versioned and saved in a central registry
64. Underneath the hood, a Docker image is created at publish time by
combining the functions and the platform into one image, achieving
immutability
FaaS base platform image
Customer Functions
myrepo/config.json
myrepo/foo.js
myrepo/bar.js
S
/etc/functions
Customer function image
S
65. The centralized function registry can be used to manage published
functions
66. These published functions can be deployed to the cloud via the NEWT
deploy commands
S
67. Functions are deployed using Titus, with most functions scheduled under
a few minutes
Titus
Container
Scheduler
S S
S S
S S
S S
S
Registry
…
68. Canary deployment and analysis can be used as part of deployment,
minimizing outages and increasing availability
69. Canary deployment and analysis can be used as part of deployment,
minimizing outages and increasing availability
70. Each deployed function version can be managed via the control plane,
with access to detailed runtime information
71. Detailed historical deployment and managed activity is available to aid
debugging
72. Autoscaling is used to automatically scale the infrastructure for each
function, saving costs and increasing availability. We require an initial
baseline configuration for each function
73. Metrics and dashboards are automatically generated for each function
74. Alerts are automatically generated based on metrics
75. Real time and historical logs are available
76. Profiling and post mortem debugging tools are made available
77. The infrastructure and operations of the platform and application itself is
handled by the centralized API platform team. UI teams are only
responsible for managing their individual functions
78. Netflix FaaS Platform
Runtime platform
architecture
Developer
experience
Management &
operations
79.
80. 80
81. 81
82.
83.
84. 84
85. @yunongx
yunong@netflix.com
@yunongx
linkedin.com/in/yunongxiao/
Questions?