ElasticLog with ES in CloudEdge
如果无法正常显示,请先停止浏览器的去广告插件。
1. ElasticLog with ES in CloudEdge
Bruce Zhao
2. CloudEdge
•
A hybrid Unified Threat Management (UTM) appliance.
• •
2
Firewall, IPS, URL filter, web security, email security, application control,
bandwidth control, user VPN, etc.
Combination of on-premise appliance and cloud service.
Copyright 2018 Trend Micro Inc.
3. Project ElasticLog
Internet
Continuous Structured Data
3
Copyright 2018 Trend Micro Inc.
Bandwidth
Security
Policy
4. Project ElasticLog
Data Analytics
4
Copyright 2018 Trend Micro Inc.
5. Project ElasticLog
{…}
…
A scalable big data system built on AWS
5
Copyright 2018 Trend Micro Inc.
6. Project ElasticLog
• •
Copyright 2018 Trend Micro Inc.
Provide second-level query for
6 month data, most queries will
hit the 1st month data.
•
6
1,300,000 data per minute, will
increase to 5 * 1,300,000 at the
end of this year.
Data MUST be accurate. No
less no more.
7. Client
Cloud
EMR
EMR
Stats Data
Kinesis Stream
Log Receivers
Hot Data
All Data
Minute Level
CE Boxes
All Data
S3
CECC Portal
REST API
ES API
Athena SQL
(query)
REST API
S3
ECS
User
Report
Athena
7
Copyright 2018 Trend Micro Inc.
Hot Data
Hour Level
8. ES in ElasticLog
•
Hot data storage and query
• • 1 month data
•
•
1,300,000 data injection per minute
Second-level query
T1-minute
One cluster
•
3 dedicated master nodes + 10 data nodes
• Master node: M4.large, Data node: C4.2Xlarge
• Cost: $7622 per month
Roll index by week
• 28 indices at most
•
8
T1-hour
T2-hour T3-hour
T4-hour T5-hour
Build on AWS
•
•
Indices of one data type
200 primary shards, 1 replica
Copyright 2018 Trend Micro Inc.
T6-hour
9. ES on AWS
•
Why is AWS ES?
• •
•
All cloud services are on AWS
AWS ES is a managed service
AWS ES
• • VPC + Security Group
• Blue/Green deployment
•
Copyright 2018 Trend Micro Inc.
Zone awareness
•
9
Easy to use
Monitor + Auto-Snapshot
10. Best Practices in ES
• • Shard Allocation
• Fast Injection in Spark
•
10
Index Design
Data Deduplication
Copyright 2018 Trend Micro Inc.
11. Index Design
One Index
•
Unable to update some mapping
settings, e.g., primary shard number
• Better for fixed/small data set
•
How to determine the interval?
•
•
•
Unable to scale out flexibly and rapidly
•
Time-based Index
•
Copyright 2018 Trend Micro Inc.
Change frequency
Try weekly-split by default
How to implement?
•
11
Data amount
Index template
12. {
"facet_internet_access_minute": {
"template": "ce-index-access-v1-*",
"order": 0,
"settings": {
"number_of_shards": 5
},
"aliases": {
"{index}-query": {}
},
"mappings": {
"es_doc": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"CLF_Timestamp": {
"type": "long"
},
"CLF_CustomerID": {
"type": "keyword"
},
"CLF_ClientIP": {
"type": "ip",
"ignore_malformed": true
}
}
}
}
}
}
12
Copyright 2018 Trend Micro Inc.
•
DO NOT use multiple doc_type in one index
• •
•
Fields that have the same names in different
types must have the same mapping definition
Not support in 6.0
Set _source=false
• •
•
Suppose you only care about metric results,
not raw documents content
Will save disk space and reduce IO
Set _all=false
•
Suppose you know exactly what fields you
want to query
13. {
"facet_internet_access_minute": {
"template": "ce-index-access-v1-*",
"order": 0,
"settings": {
"number_of_shards": 5
},
"aliases": {
"{index}-query": {}
},
"mappings": {
"es_doc": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"CLF_Timestamp": {
"type": "long"
},
"CLF_CustomerID": {
"type": "keyword"
},
"CLF_ClientIP": {
"type": "ip",
"ignore_malformed": true
}
}
}
}
}
}
13
Copyright 2018 Trend Micro Inc.
•
Set dynamic=strict
•
•
•
Suppose your data is structured-data
Avoid dirty-data injection
Set not_analyzed for String
• Suppose you only care about full match
•
• Use “keyword” in 5.x
Improve injection performance and disk space
14. {
"facet_internet_access_minute": {
"template": "ce-index-access-v1-*",
"order": 0,
"settings": {
"number_of_shards": 5
},
"aliases": {
"{index}-query": {}
},
"mappings": {
"es_doc": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"CLF_Timestamp": {
"type": "long"
},
"CLF_CustomerID": {
"type": "keyword"
},
"CLF_ClientIP": {
"type": "ip",
"ignore_malformed": true
}
}
}
}
}
}
14
Copyright 2018 Trend Micro Inc.
How to switch from one index to
the other one without downtime?
Index Index
(ce-index-v1-access-1524096000) (ce-index-v1-access-1524096000-h)
Alias
(ce-index-v1-access-1524096000-query)
15. Shard Allocation
Index
Node 1
1
15
Copyright 2018 Trend Micro Inc.
2
Node 2
2
3
Node 3
3
1
16. Shard Allocation
4
5
Calculate Shard Number
Estimate Instance
Number and Type
6
Performance Test,
Adjust Instances
3
1
Design Index
2
Estimate Index Size
and Disk Space
Estimate Data Amount
• Each shard size should be less than 30GB
• Shard Number = k * Data Nodes Number (k = a
small integer)
• Suppose you have an small index, and you have
enough instances in the cluster, try to use default
shard number
16
Copyright 2018 Trend Micro Inc.
17. Shard Allocation
Node 1
Node 2
Node 3
C4.2Xlarge
1
2
4
6
2
Node 1
1
2
3
5
4
3
Node 2
2
3
1
6
5
Node 3
3
1
C4.Xlarge
Node 4
4
17
5
Copyright 2018 Trend Micro Inc.
Node 5
5
6
Node 6
6
4
18. Fast Injection in Spark
EMR
ES-Hadoop
18
Copyright 2018 Trend Micro Inc.
19. Fast Injection in Spark
Driver
Spark Application
Slave Worker Slave Worker
Executor
Number of Spark Task
=
Number of vCores
Executor
Task
Task
Task
Task
Node 2
ThreadPool
Node 1
ThreadPool
Node 3
ThreadPool
19
Copyright 2018 Trend Micro Inc.
20. Data Deduplication
2
1
7
3
CE Box
Log Receiver
4
5
6
AWS Kinesis
Driver
AWS S3
Spark Application
Slave Worker Slave Worker
Executor Executor
Task
20
Copyright 2018 Trend Micro Inc.
Task
Task
Task will retry for 3 times
for any failure
Task
21. Data Deduplication
• • Use aggregation to find out duplication and
delete those documents
•
21
Use a custom unique ID
Do distinct query
Copyright 2018 Trend Micro Inc.
22. Data Deduplication
• Step 1: Add a “hash” field in all documents
• Step 2: Check duplication
curl -XGET http://stg-elasticlog.ap-northeast-1.es.amazonaws.com/ce-index-v1-access-1524096000/_search?pretty -d '
{
"size": 0,
"aggs": {
"duplicate": {
"terms": {
"field": "hash",
"min_doc_count": 2,
"size": 5000
},
"aggs": {
"documents": {
"top_hits": {
"size": 2
}
}
}
}
}
}
‘
• •
•
22
Step 3: Bulk delete
Copyright 2018 Trend Micro Inc.
Do not affect injection
Can be asynchronous
23. Data Deduplication
• Storage size increased heavily
• Will be slower or even failed to do
aggregation when data amount is more
than 0.3 billion
Field “hash” is unique!
23
Copyright 2018 Trend Micro Inc.
• Unfriendly to compression
• High-Cardinality
24. Data Deduplication
• Auto-generated IDs are 20 character long, URL-safe, Base64-encoded GUID strings
• For custom ID, try to pick up an ID that is friendly to Lucene
(http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html)
24
Copyright 2018 Trend Micro Inc.
25. Q&A
25
Copyright 2018 Trend Micro Inc.
26. Elastic
https://elasticsearch.cn/
26
Copyright 2018 Trend Micro Inc.