Auto-scaling your API Insights and Tips from the Zalando Team
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1. Auto-scaling your API
Insights and Tips from the Zalando Team
Sean Patrick Floyd - @oldJavaGuy
JBCNConf 2015
Barcelona, Spain
Luis Mineiro - @voidmaze
2. ONE of EUROPE’S LARGEST ONLINE FASHION RETAILERS
15 countries
3 fulfilment centers
15+ million active customers
2.2+ billion € revenue 2014
130+ million visits per month
8.000+ employees
Tech hubs in Berlin, Dublin, Dortmund
and Helsinki
Visit us: tech.zalando.com
3. Our Scale
2 datacenters
thousands of production instances
serving 15 countries
Based on https://flic.kr/p/bX5E4c
4. Zalando stack
Credits to our colleague Kolja Wilcke
5. Credits to our colleague Kolja Wilcke
6.
7. Credits to our colleague Kolja Wilcke
8. The Shop
Monolith
9. We call it “Jimmy”
http://blog.codinghorror.com/new-programming-jargon/
10. Thousands of Java classes, undocumented features
Business logic on all layers (including the database)
https://flic.kr/p/nBhvfy
11. It’s 2013…
Zalando wants to
sponsor hackathons…
We need a REST API!
https://flic.kr/p/tW4sus
12.
13. Hackathon
API
14. First Step
Let’s create some REST-(ish) Spring
controllers inside our monolithic web
application
Deploy a couple of Jimmy instances just to
serve requests for this “API”
15. Pros
The infrastructure is already there!
A few days of coding and we’re all set
16. Cons
This “API” cannot be deployed independently
of Jimmy.
Jimmy is infested with business logic
everywhere.
17. New
Requirements
18.
19. New Requirements
We needed this new API for our existing
frontends, also (very high traffic).
Some third-party apps also wanted to use this
API as their backend.
20. Shop Public API
We decided to build a “real” API as a separate
standalone project.
Of course, there were some dependencies…
21. Data Sources
1.Catalog - SOLr
2.Stock - Memcached
3.Reviews - REST(-ish) API
Shop Public API
4.Recommendations - RPC
5.Database - PostgreSQL
Catalog
Stock
Reviews
Recommendations
Database
22. All of them in the same data centers.
This should be no problem…
23. Tech
Company
24. Paradigm Shift
Until 2014
We’re a Fashion Company that has a lot
of tech knowledge
2015
Suddenly we’re a Tech Company, providing
“Fashion as a Service”
25. We established
5 principles
26. API FIRST
27. API First
Document and peer review your API in a
format like Swagger before writing a single
line of code.
Ideally, either generate either your server
interfaces or your test data (or both) from the
Swagger spec
28. REST
29. REST
Manipulate resources, don't call methods
Expose nouns, not verbs
Use HTTP verbs
GET, PUT, POST, DELETE, PATCH
30. SAAS
31. Identity Management
Our backend services don’t expose APIs yet
Company-wide IAM strategy is not ready yet
Can only expose read-only features for now
32. MICRO-
SERVICES
33. Microservices
We already have a Service-Oriented
Architecture
It was mostly SOAP, though…
And definitely not micro
34. CLOUD
35. Road to AWS
Some challenges ahead…
1. Backend services not available yet
2. SOLr also not available
3. We can’t just move the databases there
36. How we
did it
37. Challenges
Catalog and Stock datasources are latency critical
and they’re owned by different teams!
38. Can we solve that?
39. Step 1 - move critical datasources to AWS
40. “Move” Data Sources to AWS
We can’t just move them!
Jimmy also needs them in the data centers,
you insensitive clod!
Ok… we just replicate them.
41. and then…?
42. Replicating Data Sources
SOLr has its own replication mechanism
over HTTP
memcached should be easy …
43. You wish!
44. Step 2 - Build memcached replication
45. Stock Relay
us-east-1
Datacenter
SQS Queue
Memcached #1
Stock Repeater(s) Memcached ElasticCache
Shop Public API
GET
eu-west-1
Memcached #2
GET
SQS Queue
Stock Repeater(s) Memcached ElasticCache
Shop Public API
GET
Memcached #3
eu-central-1
…
Stock Relay(s)
SET / DELETE
SQS Queue
SNS Topic
Stock Repeater(s) Memcached ElasticCache
Shop Public API
46. and then…?
47. AWS SOLr Repeater
us-east-1
Region Repeater
Datacenter Master
SOLr Slaves
Shop Public API
eu-west-1
AWS Master Repeater
SOLr Slaves
Shop Public API
eu-central-1
Region Repeater
SOLr Slaves
Shop Public API
48. Autoscaling SOLr and API
Slave #1 API
Node #1
Slave #2 API
Node #2
Slave n API
Node n
Region repeater
Autoscaling Group
https://api.zalando.com
Autoscaling Group
http://www.docstoc.com/docs/109290533/Lucid-Imagination# by Erick Erickson
49. Scaling Limitations
Region repeater
Slave #3
Slave #3
Slave #3
Slave #1 Slave #3
Slave #11
Slave #2
Slave #18
Slave #3
Slave #34
Slave #3
Slave #3
Slave #3
Slave #59
Slave #3
Slave #3
Slave #3
Slave #3
Slave #3
Slave #3
Slave #3
Slave #123
Slave #3
Slave
#42815
Slave #3
Slave #325
Slave #3
Autoscaling Group
Edited from https://flic.kr/p/zuBXM
50. Y U no scale!
51. S3 Bucket for Replication
us-east-1
SOLr Slaves
Datacenter Master
Master Repeater
eu-west-1
SOLr Slaves
Shop Public API
Shop Public API
eu-central-1
SOLr Slaves
Shop Public API
52. I see…
53. “Onion layers” for Replication
slave.solr
• Start with a single repeater layer -
Slave #1
layer0-repeater.solr
layer0.solr
• Setup a Route53 CNAME repeater
Slave #2
that links to it - repeater.solr
CNAME
repeater
Slave n
• Entire slave fleet in the ASG is
always configured to replicate from
repeater.solr
Autoscaling Group
54. “Onion layers” for Replication
slave.solr
CloudWatch
Slave #1
layer0-repeater.solr
• Setup CloudWatch alarms for
Slave #2
relevant metrics in the repeater layer.
Give them some slack space.
CNAME
repeater
Slave n
• Configure it to send notifications to
the onion-layers-topic.
Autoscaling Group
• Bring up your instance of onion-
layers.
55. “Onion layers” for Replication
layer1-repeater.solr
• onion-layers creates a new ASG
layer. Calculates currentLayer =
Autoscaling Group
currentLayer + 1
• The new layer starts replicating from
layer0-repeater.solr
onion-layers
CloudWatch
CNAME
repeater
slave.solr
layer<currentLayer-1>-repeater
• Adds a new Route53 recordset
layer<currentLayer>-repeater.
Autoscaling Group
56. “Onion layers” for Replication
layer1-repeater.solr
• After replication, the CNAME
repeater is updated to link to
Autoscaling Group
CNAME
repeater
layer<currentLayer>-repeater.
• onion-layers acts on alarms for the
connection between current layer
layer0-repeater.solr
and previous layer.
slave.solr
• You can still have your own AS
alarms for the connection between
CloudWatch
Autoscaling Group
slaves and current layer.
57. Yes. These tools will be open-sourced soon
58. Thank you for listening
Check out our blog https://tech.zalando.com
Our many open source products https://github.com/zalando
The STUPS stack https://stups.io
Got more questions? You can reach us on twitter
@ZalandoTech
We’re hiring!
Special thanks to Jessie Dude.
No Continuum Transfunctioners were harmed during the production of these slides.