From Engineer to Architect - Engineering Disciplines for High-Quality and Secure Code

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
1. From Engineer to Architect - Engineering Disciplines for High-Quality and Secure Code Shinming Liu Chief Architect, Xcalibyte
2. • Content Title 1
3. Who Am I 刘新铭 Shin-Ming Liu Chief Architect & Co-founder, Xcalibyte • Compiler Scientist • Former Director, Intel IOT Research Lab, China • Former Director, HP Compiler Technology Lab • 10+ patents granted in program analysis and compiler optimization
4. Agenda • 案例:数据库重构项目 • 用 First Principle 来分析,思考 • 用性能数据来佐证、决策,并管控 • 用程序内置工具来监控架构劣坏 • 为算法适配程序语言 • 重构要及时并持续进行 • 自我提升
5. CASE – DATABASE REFACTOR PROJECT Pain Points  Service Does Not Complete  Service Timeout  Random Service Restart  UI Response Time Takes More than 25min for Extreme Cases
6. Xcalscan was a Monolithic Java Application CICD: GitLab, Jenkins, Travis CI, ... IDEs: Eclipse, VS Code, SCM: Git, Comm Tools: Email, Command Line Processes User / Browser Clients API Gateway Server Project / Config Management Report Dashboardd Issue Browse Issue Management User Managmenet License Management Data Cache Management RDB FILE Interface Scan Service Scan Engine
7. PERFORMANCE DATA COLLECTED Measurement Made:  Input Data Set Size  DB Tables Size Collected For The Input  Memory Footprint and Elapse Time   Data Ingestion into DB for the Input  IssueChangeAnalysis function for One Data Set Response Time for Dashboard Display
8. PERFORMANCE DATA COLLECTED Measurement Made for MySQL5.7  Input Data Set Size – 2.2 million LOC, 565MB Issue File  DB Tables Size Collected For The Input – 2GB  Memory Footprint and Elapse Time   Data Ingestion into DB for The Input – 22GB  IssueChangeAnalysis function for One Data Set – 12G Response Time for Dashboard Display – 180 Sec
9. ISSUE IDENTIFIED • Direct Correlation  • DB Tables Size Increases Accumulatively  • This result in Slow Response Time DB Query Inefficient Query Function Used  • Input Data Set Size  Memory Footprint  Elapse Time API Involves Full Table Scan, Join, Select Interpretive Language Used in IssueChangeAnalysis Algorithm
10. DESIGN CHANGES MADE • • • • Redesign Data Format for Data Produced by Scan Engines  Use String Table – Eliminate Duplicate String  Use Proprietary Data Format – Internal Use Only Database Table Changes:  Source Code Issue Introduction Time (GIT commit ID)  Ingest New Issue Introduced Only, Minimize DB Size Increase  Remove Read-only Data from RDB Precompute Time Consuming Frequent Query Use C++ for Complex Algorithms
11. Xcalscan Architecture After Refactor CICD: GitLab, Jenkins, Travis CI, ... Preprocess Service IDEs: Eclipse, VS Code, SCM: Git, Comm Tools: Email, Command Line Processes User / GUI API Gateway User / License Mgmt Service Project / Config Mgmt Service RDB Clients Issue Mgmt Service Report / Dashboard Issue Browse Server Scan Service Scan Result Process FILE Service Scan Engine
12. PERFORMANCE IMPROVEMENTS • Precompute Summary Data, Reduce UI Dashboard Load Time
13. PERFORMANCE IMPROVEMENTS (cont) • Identify and Skip Redundant Data, Enables Large Project Scan Such as MySQL5.7
14. PERFORMANCE IMPROVEMENTS (cont) • ICA:IssueChangeAnalysis Rewrite Complex Algorithm with C++, Save Both Time and Space
15. PERFORMANCE IMPROVEMENTS (cont) • • Remove Rule Information From RDB  Lazy Load of Rule Information  Reduced Memory Footprint During Query Time to retrieve rule info (Read Only Data) - >10x less
16. DESIGN w/ FIRST PRINCIPLE • • Steps Involved  Collect Requirements and Facts  Partition Issues till Atomic  Reduce to Minimum: Discard Irrelevant & Unimportant What Mattered In The Refactor Effort:  There were Memory & I/O Bound Issues
17. QUANTITATIVE OBJECTIVES & MEASURE • • Performance Data Played a Key Role for Architecture Decision  Architecture Choices are Subjective  Measured Data are Objective Measurements on Data, Resource and Logic  Data Size – Footprint, Persistent  Data Processing Logic – Access Pattern Matters
18. QUANTITATIVE OBJECTIVES & MEASURE • • Strategies Applied in DB Refactor Project  Program Stability Issues Happened on Cases with Large Data Set  Systematically Measure Data Size, Memory Footprint  Track Response Time for APIs that is > 2 Seconds Policy Changed during and after DB Refactor Project  Performance Measured and Summarized Nightly
19. PROGRAM TO DETECT ISSUES INTELLIGENTLY • Strategies Applied in DB Refactor Project  Log Data Characteristics at Building Block Level  Instrument Performance Monitor – White Box Approach  Nightly Regression Test Accelerates Project Integration  Use Option to Enable / Disable Functionality for Triage Purpose  Unified Log Format Facilitate Issue Reproduction
20. CODE WITH RIGHT PROGRAMMING LANGUAGE IN RIGHT PLACE • Strategy Applied in DB Refactor Project  Easy DB manipulation, Java Stayed w/ Reduced Scope  Performance intensive, C++ : Pre-calculate Data, ICA function  Wrapper Functions, Added JavaScript for React Programming
21. REFACTOR OFTEN AND EARLY • • Recommend Book:  http://stepanovpapers.com/notes.pdf by Alex Stepanov  He invented Generic Programming and C++ STL  Decomposing An Application Into A Collection Of General- purpose Algorithms And Data Structures Makes It Robust Strategy Adopted: Incremental Phase-in Changes
22. RENEW KNOWLEDGE PERIODICALLY • Bottom Line: You Own Your Own Career !  • • Get Ready for Your Next Job – In The Same Company Or Not Drive the Learning To Enhance Your Knowledge Framework  When And Where Will You Need New Techniques To Improve Your Project Architecture  Learning the Design Rationale Behind Open-Source Package Dedicate 8 Hours Per Week
23. 严谨 – 架构师的必要特质 • 用科学方法来工作:实验尝试,小心求证 • 用 First Principle 来分析、思考 • 用性能数据来佐证、决策,并管控 • 用程序内置工具来监控架构劣坏 • 为算法适配程序语言 • 重构要及时并持续进行 • Stay Hungry, Stay Foolish
24.
25. AGENDA • • • • • • • • Case – Database Refactor (DR) Project Design with First Principle Define Quantitative Objectives and Measure Nightly Program to Detect Issues Intelligently Code with Right Programming Language in Right Place Refactor Often and Early Renew Knowledge Periodically Summary
26. IMPROVEMENTS AFTER DB REFACTOR DB Ingestion Time – Now 4x faster • • • • • • DB Query Time – Now 20x faster Response Time for UI dashboard (50 proj) – 2-3 sec, was 3 – 25 min Enable Lage Project Scans  File size 10x smaller; DB Ingestion Time 10x less; Smaller Memory Footprint, e.g. MySQL5.7 – 565MB vs. 63MB; 1807s vs. 43s; 22G vs. < 1G Time and memory required to generate IssueChange  Time 10x less; Memory 25x less, e.g. SQLite 20+min vs. 2 min; 10G vs. 0.4G. Time to retrieve rule info (Read Only Data) - >10x less
27. • Content Title 1
28. SUMMARY • • • • • • • • • • 用科学方法来工作:实验尝试,小心求证 用First Principle 来思考:Connecting the Dot 用 OKR 来自我管理 用流水线来安排工作 在制高点审视全局 时时准备交班 时时刻刻提升自己的市场价值 每三年检讨自己在 Maslow's Hierarchy 的位置,步步为营 失败为成功之母,失之东隅,收之桑榆 Stay Hungry, Stay Foolish

Home - Wiki
Copyright © 2011-2024 iteam. Current version is 2.139.0. UTC+08:00, 2024-12-23 23:01
浙ICP备14020137号-1 $Map of visitor$