大数据的智能处理和数据可视化实践
如果无法正常显示,请先停止浏览器的去广告插件。
1. Big Data Intelligent Processing & Data Visualization
演讲人:吴仕橹
全球敏捷运维峰会
广州站
PUBLIC
2. Business Insights & Analytics – How it Works
3
2
1
7
4
1) Source systems are ingested into staging (a shared preparation area).
Typically through Sqoop (database copy) or CDC (streaming style change
updates) or Juniper( in the house platform)
2) System tables are copied into the Discovery environment, where this
production data is processed and models/insight are developed post Data
Factory
3) The Data Factory takes raw data through a number of steps:
Profiling : looking at the data to identify its contents and tag it
with the correct metadata
ii. Cleansing & curating : restructuring the data into the simplest
and most efficient form, highlighting errors to revert back to
source system owners
iii. Enriching : creating new derived fields based on the raw data
(e.g. flags) and appending reference data for models to utilise
iv. Record linking : using advanced techniques to join up disparate
data and masses of separate sources into a single logical model
v. Indexing : organising the final data asset into an index, making it
quickly searchable
8
6
5
i. 4) Stabilised assets and models are pushed through our UAT environment
for testing and data validation from the consuming users
5) Final models and assets are then landed in our production environment;
their insight ready for consumption through agreed patterns (typically APIs
or file transfers)
6) The Data Guardian will control all consumption compliance
7) Data Exchange hosts APIs/APPs to source data to consumers
全球敏捷运维峰会
广州站
PUBLIC
3. Data & Analytics Execution
Ingest
Automated feed of
data, copying the
source systems into
the GBM Data &
Analytics Lake
Transform
Data is pre-
processed,
transformed and
optimised by Data
Engineers
Profile
Data is profiled to tag
components for
metadata analysis
Algorithms used to
predict data type and
automatically tag
Link
The tagged data is
linked and enriched
using machine
learning, generating
unique identifiers for
clients
Analyse Consume
The enriched data is
validated against
business rules to
ensure that it is fit for
purpose The finalised data is
passed into a range
of MI, analytics and
data science
applications to
generate business
value
Use Cases in
Execute
Case 1
Case 2
Case 3
Case 4
Case 5
Case 6
Case 7
Case 8
Pre-processed
Source Data
Metadata Modelling
Record linked
Network Graph
Data Validation
Results
Raw XML
Trade Data
全球敏捷运维峰会
广州站
PUBLIC
Time-series
Application
4. Data Guardian - 1
Source Data
Information Asset Registry
· Data ingested from hundreds of source
systems · Golden source for physical to logical
mappings, mastered in data factory
· Data cleansed via GBM Data Factory · · Data presented in use case assets Repository for logical attribute hierarchy,
containing terms where necessary
Attribute
Tagging
Data Asset
· Policy's obtained from regional legal and
compliance teams
· Policy converted into set of sharing rules
· Rules converted into Standard Rules
Template ready for consumption
Data Sharing
Audit
Compliance
Rules
全球敏捷运维峰会
广州站
PUBLIC
Data Guardian
· Policy Administration tool linked up with
meta data store, allows policy rules to be
entered in logical terms
· Each “data access request type” is assed by
Policy Engine in order to produce a Policy
Decision Point summarizing the resultant
compliant dataset
· Automatic adaption of queries and in
process filters in order to produce
compliant data view
5. Data Guardian - 2
全球敏捷运维峰会
广州站
PUBLIC
6. Data Exchange
全球敏捷运维峰会
广州站
PUBLIC
7. Rapid-V Design
全球敏捷运维峰会
广州站
PUBLIC
8. Rapid-V Demo
全球敏捷运维峰会
广州站
PUBLIC
9. Rapid-V Sample
全球敏捷运维峰会
广州站
PUBLIC
10. Join Us
THANK YOU!
全球敏捷运维峰会
广州站
PUBLIC