Going FaaSter, Functions as a Service at Netflix

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
1. 130 million customers in over 190 countries streaming 140 million hrs/day
2.
3.
4. 4
5.
6. We use a data driven approach via A/B testing for most changes to our product — ensuring every change delights our customers source: https://www.optimizely.com/optimization-glossary/ab-testing/
7. 1000s of A/B tests a year
8.
9.
10.
11. Netflix API http://api.netflix.com
12. The Netflix API decouples clients from the backend services, providing a integration point for both services and clients Clients Client API Edge API Backend Services TV Search iOS MAP Android Remote Service Layer GPS … Windows Playback Browsers
13. The Netflix API uses the BFF (backend for frontend) pattern, where the BFF is tightly coupled to each device — making it easier to define and adapt the UI, and streamlining releases Clients Client API Edge API Backend Services BFF TV Search iOS MAP Android Remote Service Layer GPS … Windows Playback Browsers
14. These BFFs are maintained by the UI teams, since it’s tightly coupled to their UI
15. Netflix API requirements Velocity Ergonomic Reliability No Operations
16. Going FaaSter: Function as a Service at Netflix @ Yunong Xiao, Principal Software Engineer, Netflix
17. FaaS Evolution Pre-Cloud On Prem λ Application Services Platform You Manage IaaS PaaS FaaS λ λ λ Application Application Application Services Platform Services Platform Services Platform Others Manage
18. Build or buy? Pros Cons No-ops Homogenous architecture Accessible monitoring & debugging Velocity Netflix stack integration Reliable service platform Limits: latency, memory, execution time
19. We’ll cover: Runtime platform architecture Developer experience Management & operations
20. Pre-Cloud On Prem λ Application Services Platform You Manage IaaS PaaS FaaS λ λ λ Application Application Application Services Platform Services Platform Services Platform Others Manage
21. We are almost completely hosted in the cloud using AWS
22. EC2 makes up the foundation of infrastructure at Netflix
23. VMs or Containers?
24. We chose to use containers as the foundation of our FaaS platform, as it gave us advantages which let us build a platform that is ergonomic, efficient, with high deployment velocity Lightweight & Fast Deployments Portability across environments Efficient bin packing
25. We built Titus — our own container management platform — capable of launching millions of containers a day
26. Pre-Cloud On Prem λ Application Services Platform You Manage IaaS PaaS FaaS λ λ λ Application Application Application Services Platform Services Platform Services Platform Others Manage
27. We have created a reliable, open source services platform
28. We have created a reliable, open source services platform Service Discovery: Eureka https://github.com/Netflix/eureka RPC: Ribbon (HTTP), gRPC https://github.com/Netflix/ribbon Configuration: Archaius https://github.com/Netflix/archaius Metrics: Atlas https://github.com/Netflix/atlas Fault tolerance: Hystrix https://github.com/Netflix/hystrix External LB: Zuul https://github.com/Netflix/zuul Tracing: Mantis, Salp …
29. Assembling these components yourself is time consuming, difficult, and error prone
30. Assembling these components yourself is time consuming, difficult, and error prone
31. You always have to keep components updated to the latest versions yourself
32. You have to ensure that metrics and dashboards are created for your service
33. You’re on the hook for managing and operating the infrastructure
34. You shouldn’t have to set everything up from scratch every time when all you care about is the business logic 34
35. Pre-Cloud On Prem λ Application Services Platform You Manage IaaS PaaS FaaS λ λ λ Application Application Application Services Platform Services Platform Services Platform Others Manage
36. We set out to build our runtime FaaS platform that solves these issues No assembly required Automatic updates Observable metrics Managed operations 36
37. The platform is a services container that has been pre-assembled with all of the components needed for a production ready service Service Registration Metrics Service Discovery Daemon Stream Processing Metrics Daemon Configuration Log rotation Server Auth Throttling RPC Clients
38. All that’s needed is for customers to insert their business logic Service Registration Metrics Service Discovery Daemon Stream Processing Metrics Daemon Configuration Log rotation Server Auth Throttling Route /foo Route /bar … RPC Clients
39. We package and version the platform as a single entity, and can easily upgrade and test the components once and ensure everyone receives the upgrade
40. We control the runtime, the platform can emit a consistent set of application, RPC, and systems metrics for every function Service Registration Metrics Service Discovery Daemon Stream Processing Metrics Daemon Configuration Log rotation Server Auth Throttling Route /foo Route /bar … RPC Clients
41. We set out to build our runtime FaaS platform that solves these issues No assembly required Automatic updates Observable metrics Managed operations 41
42. We’ll cover: Runtime platform architecture Developer experience Management & operations
43.
44. Functions are managed via a configuration API, where most fields are optional. { "service": { "org": "iosui", "name": "iphone" }, "platformVersion": "^6.0.0", "routes": { "routes": { "movies": { "get": { "source": “./lib/endpoints/movies.js" } }, "profile": { "post": { "source": “./lib/endpoints/profile.js” } } } }, "sources": ["./lib"], "propertiesPath": "./etc", "startupHooks": [ "./hooks/startupHook.js" ] Service name FaaS platform version Function declarations Additional source code Configuration Lifecycle management
45. Business logic can be implemented using the popular Node.js “Connect” style middleware which handles requests. HTTP Request object HTTP Response callback module.exports = function(req, res, next) { res.send(200, req.query); }; return next();
46. Platform components such as metrics, loggers, or RPC clients are available via the “req” object — providing a full runtime API for developers module.exports = function ping(req, res, next) { req.log.info('Hello World!'); req.getRequestContext(); // request context req.getAtlas(); // metrics client req.getDNAClient(); // RPC client req.getProperties(); // Configuration Client req.getEdgar(); // Tracing req.getMantis(); // Stream processing client req.getGeo(); // Geo location req.getPassport(); // Auth }; return next();
47. Long lived third party libraries can be managed via startup and shutdown lifecycle hooks. "startupHooks": [ "./hooks/startupHook.js" ], "shutdownHooks": [ "./hooks/shutdownHook.js" ]
48. Hooks are initiated before the platform starts, have access to all platform components, and allow for third party libraries to be made available on the request object // executed before platform starts module.exports = function startuphook(opts, cb) { // access to all platform components opts.atlas; opts.infrastructureInfo; opts.log; ... opts.properties; opts.serviceInfo; }; // return an object that will be made available // to all functions return cb(null, { foo: 'bar' });
49. External dependencies can be imported from
50. Our goal is to create a local function development experience that improves the software development life cycle for developers
51. We created a developer workflow tool called NEWT (Netflix Workflow Toolkit) which simplifies and facilitates common developer tasks Development Debugging Testing Publishing Deployment
52.
53. One-click setup for a consistent development environment. Installs dependencies and keeps them updated
54. We created a development FaaS platform for local development — enabling engineers to interactively test functions in seconds — reducing friction and increasing velocity Dev FaaS platform live reload Service Registration Service Discovery Daemon Server Metrics Auth Stream Processing Metrics Daemon Throttling Configuration local functions RPC Clients Log rotation
55. Local debugging further increases velocity and reduces friction of the SDLC Dev FaaS platform Service Registration Serve Metrics Service Discovery Daemon Auth Stream Processing Metrics Daemon Throttling local testing Configuration Log rotation RPC Clients Attach debugger Logs
56. The local FaaS platform can be integrated and routed within the Netflix cloud, enabling seamless end to end testing Servi Auth Throt S Metri Servi Strea Metri Confi Log RPC Device Zuul: Auth, SSL, … Backend services Local functions
57. Teams also want to test functions in isolation without having to connect to or depend on upstream and downstream services Servi Auth Throt S Metri Servi Strea Metri Confi Log RPC Device Zuul: Auth, SSL, … Isolated Local functions local functions Backend services
58. The FaaS platform provides mocks and unit test APIs which allows teams to test functions in isolation without having to connect to or depend on upstream and downstream services module.exports = function ping(req, res, next) { req.log.info('Hello World!'); req.getRequestContext(); // request context req.getAtlas(); // metrics client req.getDNAClient(); // RPC client req.getProperties(); // Configuration Client req.getEdgar(); // Tracing req.getMantis(); // Stream processing client req.getGeo(); // Geo location req.getPassport(); // Auth }; return next(); Runtime API requires downstream services to be available
59. The FaaS platform provides mocks and unit test APIs which allows teams to test functions in isolation without having to connect to or depend on upstream and downstream services // Unit test it('should create all mocks', function(done) { mocks.create(function(err, allMocks) { assert.isObject(allMocks); assert.isObject(allMocks.log); assert.isObject(allMocks.properties); ... assert.isObject(allMocks.req); assert.isObject(allMocks.res); return done(); }); }); Mocks are available from the unit test API
60. This development platform can also be easily deployed to Jenkins using NEWT, unlocking CI/CD tests for both the FaaS platform and functions themselves
61. We’ll cover: Runtime platform architecture Developer experience Management & operations
62. Publish Deploy Operate
63. Functions are published using our NEWT tool, and are immutably versioned and saved in a central registry
64. Underneath the hood, a Docker image is created at publish time by combining the functions and the platform into one image, achieving immutability FaaS base platform image Customer Functions myrepo/config.json myrepo/foo.js myrepo/bar.js S /etc/functions Customer function image S
65. The centralized function registry can be used to manage published functions
66. These published functions can be deployed to the cloud via the NEWT deploy commands S
67. Functions are deployed using Titus, with most functions scheduled under a few minutes Titus Container Scheduler S S S S S S S S S Registry …
68. Canary deployment and analysis can be used as part of deployment, minimizing outages and increasing availability
69. Canary deployment and analysis can be used as part of deployment, minimizing outages and increasing availability
70. Each deployed function version can be managed via the control plane, with access to detailed runtime information
71. Detailed historical deployment and managed activity is available to aid debugging
72. Autoscaling is used to automatically scale the infrastructure for each function, saving costs and increasing availability. We require an initial baseline configuration for each function
73. Metrics and dashboards are automatically generated for each function
74. Alerts are automatically generated based on metrics
75. Real time and historical logs are available
76. Profiling and post mortem debugging tools are made available
77. The infrastructure and operations of the platform and application itself is handled by the centralized API platform team. UI teams are only responsible for managing their individual functions
78. Netflix FaaS Platform Runtime platform architecture Developer experience Management & operations
79.
80. 80
81. 81
82.
83.
84. 84
85. @yunongx yunong@netflix.com @yunongx linkedin.com/in/yunongxiao/ Questions?

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.124.0. UTC+08:00, 2024-04-25 01:29
浙ICP备14020137号-1 $访客地图$