大数据的智能处理和数据可视化实践

如果无法正常显示，请先停止浏览器的去广告插件。

1. Big Data Intelligent Processing & Data Visualization 演讲人：吴仕橹全球敏捷运维峰会广州站 PUBLIC

2. Business Insights & Analytics – How it Works 3 2 1 7 4 1) Source systems are ingested into staging (a shared preparation area). Typically through Sqoop (database copy) or CDC (streaming style change updates) or Juniper( in the house platform) 2) System tables are copied into the Discovery environment, where this production data is processed and models/insight are developed post Data Factory 3) The Data Factory takes raw data through a number of steps: Profiling : looking at the data to identify its contents and tag it with the correct metadata ii. Cleansing & curating : restructuring the data into the simplest and most efficient form, highlighting errors to revert back to source system owners iii. Enriching : creating new derived fields based on the raw data (e.g. flags) and appending reference data for models to utilise iv. Record linking : using advanced techniques to join up disparate data and masses of separate sources into a single logical model v. Indexing : organising the final data asset into an index, making it quickly searchable 8 6 5 i. 4) Stabilised assets and models are pushed through our UAT environment for testing and data validation from the consuming users 5) Final models and assets are then landed in our production environment; their insight ready for consumption through agreed patterns (typically APIs or file transfers) 6) The Data Guardian will control all consumption compliance 7) Data Exchange hosts APIs/APPs to source data to consumers 全球敏捷运维峰会广州站 PUBLIC

3. Data & Analytics Execution Ingest Automated feed of data, copying the source systems into the GBM Data & Analytics Lake Transform Data is pre- processed, transformed and optimised by Data Engineers Profile Data is profiled to tag components for metadata analysis Algorithms used to predict data type and automatically tag Link The tagged data is linked and enriched using machine learning, generating unique identifiers for clients Analyse Consume The enriched data is validated against business rules to ensure that it is fit for purpose The finalised data is passed into a range of MI, analytics and data science applications to generate business value Use Cases in Execute Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Pre-processed Source Data Metadata Modelling Record linked Network Graph Data Validation Results Raw XML Trade Data 全球敏捷运维峰会广州站 PUBLIC Time-series Application

4. Data Guardian - 1 Source Data Information Asset Registry · Data ingested from hundreds of source systems · Golden source for physical to logical mappings, mastered in data factory · Data cleansed via GBM Data Factory · · Data presented in use case assets Repository for logical attribute hierarchy, containing terms where necessary Attribute Tagging Data Asset · Policy's obtained from regional legal and compliance teams · Policy converted into set of sharing rules · Rules converted into Standard Rules Template ready for consumption Data Sharing Audit Compliance Rules 全球敏捷运维峰会广州站 PUBLIC Data Guardian · Policy Administration tool linked up with meta data store, allows policy rules to be entered in logical terms · Each “data access request type” is assed by Policy Engine in order to produce a Policy Decision Point summarizing the resultant compliant dataset · Automatic adaption of queries and in process filters in order to produce compliant data view

5. Data Guardian - 2 全球敏捷运维峰会广州站 PUBLIC

6. Data Exchange 全球敏捷运维峰会广州站 PUBLIC

7. Rapid-V Design 全球敏捷运维峰会广州站 PUBLIC

8. Rapid-V Demo 全球敏捷运维峰会广州站 PUBLIC

9. Rapid-V Sample 全球敏捷运维峰会广州站 PUBLIC

10. Join Us THANK YOU！全球敏捷运维峰会广州站 PUBLIC