Why Zalando trusts in PostgreSQL
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1.
2. Why Zalando
trusts in PostgreSQL
A developer’s view on using the most advanced
open-source database
Henning Jacobs - Technical Lead Platform/Software
Zalando GmbH
Valentine Gogichashvili - Technical Lead Platform/Database
Zalando GmbH
GOTO Berlin 2013, 2013-10-18
3. Who we are
Henning Jacobs <henning.jacobs@zalando.de>
with Zalando since 2010
NO DBA, NO PostgreSQL Expert
Valentine Gogichashvili <valentine.gogichashvili@zalando.de>
with Zalando since 2010
DBA, PostgreSQL Expert
3/37
4. About Zalando
·
·
·
·
·
14 countries
> 1 billion € revenue 2012
> 150,000 products
3 warehouses
Europe's largest online fashion retailer
4/37
5. Our ZEOS Platform
5/37
6. Our Main Stack
Java → PostgreSQL
Main/Production
CXF-WS
SProcWrapper
JDBC
Tomcat
API Schema
Data Schemas
Postgres
Java 7, Tomcat 7, PostgreSQL 9.0–9.3
6/37
7. Our Main Stacks
Different Use Cases — same Database
Main/Production
CRUD/JPA
Scripting/Python
CXF-WS CXF-WS HTTP/REST
SProcWrapper JPA SQLAlchemy
JDBC
Tomcat JDBC Tomcat psycopg2 CherryPy
Postgres Data Schemas Postgres Data Schemas Postgres
API Schema
Data Schemas
Java 7, Tomcat 7, Python 2.7, PostgreSQL 9.0–9.3
7/37
8. Some Numbers
·
·
·
·
·
·
> 90 deployment units (WARs)
> 800 production Tomcat instances
> 50 different databases
> 90 database master instances
> 5 TB of PostgreSQL data
> 200 developers, 8 DBAs
8/37
9. Stored Procedures
SET search_path ....
SELECT register_customer(...)
API Schemas
stored procedures,
custom types, ...
z_api_v13_42
register_customer(..)
find_orders(..)
...
register_customer(..)
find_orders(..)
...
Data Schema(s)
tables, views, ...
z_data
customer
order
order_address
...
9/37
10. Stored Procedures
1:1 Mapping using SProcWrapper
@SProcService
public interface CustomerSProcService {
@SProcCall
int registerCustomer(@SProcParam String email, @SProcParam Gender gender);
}
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender)
RETURNS int AS
$$
INSERT INTO z_data.customer (c_email, c_gender)
VALUES (p_email, p_gender)
RETURNING c_id
$$
LANGUAGE 'sql' SECURITY DEFINER;
JAVA
SQL
10/37
11. Stored Procedures
Complex Types
@SProcCall
List<Order> findOrders(@SProcParam String email);
CREATE FUNCTION find_orders(p_email text,
OUT order_id int,
OUT order_date timestamp,
OUT shipping_address order_address) RETURNS SETOF RECORD AS
$$
SELECT o_id, o_date, ROW(oa_street, oa_city, oa_country)::order_address
FROM z_data.order
JOIN z_data.order_address ON oa_order_id = o_id
JOIN z_data.customer ON c_id = o_customer_id
WHERE c_email = p_email
$$
LANGUAGE 'sql' SECURITY DEFINER;
JAVA
SQL
11/37
12. Stored Procedures
Experience
·
·
·
·
·
·
·
Performance benefits
Easy to change live behavior
Validation close to data
Simple transaction scope
Makes moving to new software version easy
Cross language API layer
CRUD ⇒ JPA
12/37
13. Stored Procedures
Rolling out database changes
· API versioning
- Automatic roll-out during deployment
- Java backend selects "right" API version
· Modularization
- shared SQL gets own Maven artifacts
- feature/library bundles of Java+SQL
· DB-Diffs
- SQL scripts for database changes
- Review process
- Tooling support
13/37
14. Stored Procedures & Constraints
Protect Your Most Valuable Asset: Your Data
14/37
15. Constraints
Ensuring Data Quality
Simple SQL expressions:
SQL
CREATE TABLE z_data.host (
h_hostname
varchar(63) NOT NULL UNIQUE CHECK (h_hostname ~ '^[a-z][a-z0-9-]*$'),
UNIQUE CHECK (masklen(h_ip) = 32),
h_ip
inet
h_memory_bytes bigint
CHECK (h_memory_bytes > 8*1024*1024)
);
COMMENT ON COLUMN z_data.host.h_memory_bytes IS 'Main memory (RAM) of host in Bytes';
Combining check constraints and stored procedures:
CREATE TABLE z_data.customer (
NOT NULL UNIQUE CHECK (is_valid_email(c_email)),
c_email
text
..
);
SQL
15/37
16. Constraints
What about MySQL?
The MySQL manual says:
The CHECK clause is parsed but ignored
by all storage engines.
Open Bug ticket since 2004: http://bugs.mysql.com/bug.php?id=3464
http://dev.mysql.com/doc/refman/5.7/en/create-table.html
16/37
17. Constraints
Custom Type Validation using Triggers
config_value config_type
cv_id : serial
cv_key : text
cv_value : json
cv_type_id : int ct_id : serial
ct_name : text
ct_type : base_type
ct_regex : text
...
key:
recipientEmail
value:
"john.doe@example.org"
type:
EMAIL_ADDRESS
17/37
18. Custom Type Validation using Triggers
CREATE FUNCTION validate_value_trigger() RETURNS trigger AS
$$
BEGIN
PERFORM validate_value(NEW.cv_value, NEW.cv_type_id);
END;
$$
LANGUAGE 'plpgsql'; SQL
CREATE FUNCTION validate_value(p_value json, p_type_id int) RETURNS void AS
$$
import json
import re
# ... Python code, see next slide
$$
LANGUAGE 'plpythonu'; SQL
18/37
19. Custom Type Validation with PL/Python
PYTHON
class TypeValidator(object):
@staticmethod
def load_by_id(id):
tdef = plpy.execute('SELECT * FROM config_type WHERE ct_id = %d' % int(id))[0]
return _get_validator(tdef)
def check(self, condition, msg):
if not condition:
raise ValueError('Value does not match the type "%s". Details: %s' %
(self.type_name, msg))
class BooleanValidator(TypeValidator):
def validate(self, value):
self.check(type(value) is bool, 'Boolean expected')
validator = TypeValidator.load_by_id(p_type_id)
validator.validate(json.loads(p_value))
19/37
20. Scaling?
20/37
21. Sharding
Scaling Horizontally
App
SProcWrapper
Shard 1
Cust. A-F
Shard 2
Cust. G-K
Shard N
Cust. Y-Z
21/37
22. Sharding
SProcWrapper Support
Sharding customers by ID:
@SProcCall
int registerCustomer(@SProcParam @ShardKey int customerId,
@SProcParam String email, @SProcParam Gender gender);
JAVA
Sharding articles by SKU (uses MD5 hash):
@SProcCall
Article getArticle(@SProcParam @ShardKey Sku sku);
JAVA
Collecting information from all shards concurrently:
@SProcCall(runOnAllShards = true, parallel = true)
List<Order> findOrders(@SProcParam String email);
JAVA
22/37
23. Sharding
Auto Partitioning Collections
@SProcCall(parallel = true)
void updateStockItems(@SProcParam @ShardKey List<StockItem> items);
JAVA
23/37
24. Sharding
Bitmap Data Source Providers
24/37
25. Sharding
Experience
·
·
·
·
Scalability without sacrificing any SQL features
Start with a reasonable number of shards (8)
Some tooling required
Avoid caching if you can and scale horizontally
25/37
26. Fail Safety
Replication
· All databases use streaming replication
· Every database has a (hot) standby and a delayed slave
WAL
Service IP
App
WAL
Slave
with 1 h delay
Hot Standby
for readonly queries
26/37
27. Fail Safety
Failover
· Service IP for switching/failover
· Monitoring with Java and custom web frontend
· Failover is manual task
27/37
28. Fail Safety
General Remarks
· Good hardware
- G8 servers from HP
- ECC RAM, RAID
· No downtimes allowed
- Major PostgreSQL version upgrades?
· Two data centers
· Dedicated 24x7 team
· Maintenance
- Concurrent index rebuild, table compactor
28/37
29. Monitoring
You need it...
· Nagios/Icinga
· PGObserver
· pg_view
29/37
30. Monitoring
PGObserver
30/37
31. Monitoring
PGObserver
· Locate hot spots
- Frequent stored
procedure calls
- Long running stored
procedures
- I/O patterns
· Helps tuning DB performance
· Creating the right indices
often a silver bullet
31/37
32. Monitoring
pg_view
· Top-like command
line utility
· DBA's daily tool
· Analyze live
problems
· Monitor data
migrations
32/37
33. NoSQL
Relational is dead?
If your project is not particularly vital and/or your team is
not particularly good, use relational databases!
Martin Fowler, GOTO Aarhus 2012
People vs. NoSQL, GOTO Aarhus 2012
33/37
34. NoSQL
Relational is dead?
The key goals of F1's design are:
3. Consistency: The system must provide ACID
transactions, [..]
4. Usability: The system must provide full SQL query
support [..]
Features like indexes and ad hoc query are not just nice
to have, but absolute requirements for our business.
F1: A Distributed SQL Database That Scales
34/37
35. NoSQL – Comparison
· Aggregate oriented?
Schemaless
- SProcWrapper
Aggregate-Oriented
- Changes?
Document
Column-Family
· Schemaless?
Key-Value
- ⇒ Implicit Schema
- HStore, JSON
· Scaling?
· Auth*?
· Use cases? ⇒ "Polyglot Persistence"
Graph
Introduction to NoSQL, GOTO Aarhus 2012
35/37
36. YeSQL
PostgreSQL at Zalando
· Fast, stable and well documented code base
· Performs as well as (and outperforms some) NoSQL
databases while retaining deeper exibility
· Scaling horizontally can be easy
· Know your tools — whatever they are!
PostgreSQL and NoSQL
36/37
37. Thank You!
Please Visit also
· SProcWrapper – Java library for stored procedure access
github.com/zalando/java-sproc-wrapper
· PGObserver – monitoring web tool for PostgreSQL
github.com/zalando/PGObserver
· pg_view – top-like command line activity monitor
github.com/zalando/pg_view
tech.zalando.com
Please rate this session!
37/37