From Engineer to Architect - Engineering Disciplines for High-Quality and Secure Code
如果无法正常显示,请先停止浏览器的去广告插件。
1. From Engineer to Architect -
Engineering Disciplines for
High-Quality and Secure Code
Shinming Liu
Chief Architect, Xcalibyte
2. •
Content Title 1
3. Who Am I
刘新铭 Shin-Ming
Liu
Chief Architect & Co-founder, Xcalibyte
• Compiler
Scientist
• Former Director, Intel IOT Research Lab,
China
• Former Director, HP Compiler Technology
Lab
• 10+ patents granted in program analysis and
compiler optimization
4. Agenda
• 案例:数据库重构项目
• 用 First Principle 来分析,思考
• 用性能数据来佐证、决策,并管控
• 用程序内置工具来监控架构劣坏
• 为算法适配程序语言
• 重构要及时并持续进行
• 自我提升
5. CASE – DATABASE REFACTOR PROJECT
Pain Points
Service Does Not Complete
Service Timeout
Random Service Restart
UI Response Time Takes
More than 25min for
Extreme Cases
6. Xcalscan was a Monolithic Java Application
CICD:
GitLab,
Jenkins,
Travis CI,
...
IDEs:
Eclipse,
VS Code,
SCM:
Git,
Comm Tools:
Email,
Command Line
Processes
User / Browser
Clients
API Gateway
Server
Project / Config Management
Report Dashboardd
Issue Browse
Issue Management
User Managmenet
License Management
Data Cache Management
RDB
FILE Interface
Scan
Service
Scan
Engine
7. PERFORMANCE DATA COLLECTED
Measurement Made:
Input Data Set Size
DB Tables Size Collected For The Input
Memory Footprint and Elapse Time
Data Ingestion into DB for the Input
IssueChangeAnalysis function for One Data Set
Response Time for Dashboard Display
8. PERFORMANCE DATA COLLECTED
Measurement Made for MySQL5.7
Input Data Set Size – 2.2 million LOC, 565MB Issue File
DB Tables Size Collected For The Input – 2GB
Memory Footprint and Elapse Time
Data Ingestion into DB for The Input – 22GB
IssueChangeAnalysis function for One Data Set – 12G
Response Time for Dashboard Display – 180 Sec
9. ISSUE IDENTIFIED
•
Direct Correlation
•
DB Tables Size Increases Accumulatively
•
This result in Slow Response Time DB Query
Inefficient Query Function Used
•
Input Data Set Size Memory Footprint Elapse Time
API Involves Full Table Scan, Join, Select
Interpretive Language Used in IssueChangeAnalysis Algorithm
10. DESIGN CHANGES MADE
•
•
•
•
Redesign Data Format for Data Produced by Scan Engines
Use String Table – Eliminate Duplicate String
Use Proprietary Data Format – Internal Use Only
Database Table Changes:
Source Code Issue Introduction Time (GIT commit ID)
Ingest New Issue Introduced Only, Minimize DB Size Increase
Remove Read-only Data from RDB
Precompute Time Consuming Frequent Query
Use C++ for Complex Algorithms
11. Xcalscan Architecture After Refactor
CICD:
GitLab,
Jenkins,
Travis CI,
...
Preprocess Service
IDEs:
Eclipse,
VS Code,
SCM:
Git,
Comm Tools:
Email,
Command Line
Processes
User / GUI
API Gateway
User /
License
Mgmt
Service
Project /
Config
Mgmt
Service
RDB
Clients
Issue Mgmt
Service
Report /
Dashboard
Issue Browse
Server
Scan Service
Scan Result
Process
FILE Service
Scan
Engine
12. PERFORMANCE IMPROVEMENTS
•
Precompute Summary Data, Reduce UI Dashboard Load Time
13. PERFORMANCE IMPROVEMENTS
(cont)
•
Identify and Skip Redundant Data, Enables Large Project Scan Such as
MySQL5.7
14. PERFORMANCE IMPROVEMENTS
(cont)
•
ICA:IssueChangeAnalysis
Rewrite Complex Algorithm with C++, Save Both Time and Space
15. PERFORMANCE IMPROVEMENTS
(cont)
•
•
Remove Rule Information From RDB
Lazy Load of Rule Information
Reduced Memory Footprint
During Query
Time to retrieve rule info (Read Only Data) - >10x less
16. DESIGN w/ FIRST PRINCIPLE
•
•
Steps Involved
Collect Requirements and Facts
Partition Issues till Atomic
Reduce to Minimum: Discard Irrelevant & Unimportant
What Mattered In The Refactor Effort:
There were Memory & I/O Bound Issues
17. QUANTITATIVE OBJECTIVES & MEASURE
•
•
Performance Data Played a Key Role for Architecture Decision
Architecture Choices are Subjective
Measured Data are Objective
Measurements on Data, Resource and Logic
Data Size – Footprint, Persistent
Data Processing Logic – Access Pattern Matters
18. QUANTITATIVE OBJECTIVES & MEASURE
•
•
Strategies Applied in DB Refactor Project
Program Stability Issues Happened on Cases with Large Data
Set
Systematically Measure Data Size, Memory Footprint
Track Response Time for APIs that is > 2 Seconds
Policy Changed during and after DB Refactor Project
Performance Measured and Summarized Nightly
19. PROGRAM TO DETECT ISSUES
INTELLIGENTLY
•
Strategies Applied in DB Refactor Project
Log Data Characteristics at Building Block Level
Instrument Performance Monitor – White Box Approach
Nightly Regression Test Accelerates Project Integration
Use Option to Enable / Disable Functionality for Triage Purpose
Unified Log Format Facilitate Issue Reproduction
20. CODE WITH RIGHT PROGRAMMING
LANGUAGE IN RIGHT PLACE
•
Strategy Applied in DB Refactor Project
Easy DB manipulation, Java Stayed w/ Reduced Scope
Performance intensive, C++ : Pre-calculate Data, ICA function
Wrapper Functions, Added JavaScript for React Programming
21. REFACTOR OFTEN AND EARLY
•
•
Recommend Book:
http://stepanovpapers.com/notes.pdf by Alex Stepanov
He invented Generic Programming and C++ STL
Decomposing An Application Into A Collection Of General-
purpose Algorithms And Data Structures Makes It Robust
Strategy Adopted: Incremental Phase-in Changes
22. RENEW KNOWLEDGE PERIODICALLY
•
Bottom Line: You Own Your Own Career !
•
•
Get Ready for Your Next Job – In The Same Company Or Not
Drive the Learning To Enhance Your Knowledge Framework
When And Where Will You Need New Techniques To Improve
Your Project Architecture
Learning the Design Rationale Behind Open-Source Package
Dedicate 8 Hours Per Week
23. 严谨 – 架构师的必要特质
• 用科学方法来工作:实验尝试,小心求证
• 用 First Principle 来分析、思考
• 用性能数据来佐证、决策,并管控
• 用程序内置工具来监控架构劣坏
• 为算法适配程序语言
• 重构要及时并持续进行
• Stay Hungry, Stay Foolish
24.
25. AGENDA
•
•
•
•
•
•
•
•
Case – Database Refactor (DR) Project
Design with First Principle
Define Quantitative Objectives and Measure Nightly
Program to Detect Issues Intelligently
Code with Right Programming Language in Right Place
Refactor Often and Early
Renew Knowledge Periodically
Summary
26. IMPROVEMENTS AFTER DB
REFACTOR
DB Ingestion Time – Now 4x faster
•
•
•
•
•
•
DB Query Time – Now 20x faster
Response Time for UI dashboard (50 proj) – 2-3 sec, was 3 – 25 min
Enable Lage Project Scans
File size 10x smaller; DB Ingestion Time 10x less; Smaller Memory
Footprint, e.g. MySQL5.7 – 565MB vs. 63MB; 1807s vs. 43s; 22G vs.
< 1G
Time and memory required to generate IssueChange
Time 10x less; Memory 25x less, e.g. SQLite 20+min vs. 2 min; 10G vs.
0.4G.
Time to retrieve rule info (Read Only Data) - >10x less
27. •
Content Title 1
28. SUMMARY
•
•
•
•
•
•
•
•
•
•
用科学方法来工作:实验尝试,小心求证
用First Principle 来思考:Connecting the Dot
用 OKR 来自我管理
用流水线来安排工作
在制高点审视全局
时时准备交班
时时刻刻提升自己的市场价值
每三年检讨自己在 Maslow's Hierarchy 的位置,步步为营
失败为成功之母,失之东隅,收之桑榆
Stay Hungry, Stay Foolish