一、背景
特征平台主要是基于PyFlink on K8S去实现数据ETL,把数据从Hbase、Hive 、关系型数据库等大数据ODS(Operational Data store )层去拉取相应的数据,存储到特征平台中统一供数据科学家、数据工程师、机器学习工程师用来做算法模型的训练、数据测试以及其他的数据应用,能够解决数据存储分散、特征重复、提取复杂、使用困难等问题。
大数据特征平台项目组自建了 Kubernetes 容器管理平台,上面跑了Flink任务、Zeppelin notebook任务等。如何监控K8S集群,并对集群及任务异常进行告警,这是项目最基本配置。而作为云原生社区第二个毕业项目Prometheus就是k8s监控告警的不二选择了。
文章主要分几个方面进行介绍,先对Prometheus进行介绍,接着会介绍下项目中基于prometheus如何去做K8S集群、Flink作业运行、ElasticSearch等组件的监控和告警。
Prometheus是SoundCloud公司基于GO语言开发的源于(启发于)谷歌borgmon的开源监控工具,Prometheus于2016年加入云原生云计算基金会(CNCF)成为继Kubernetes之后的第二个托管项目,作为新一代的监控框架,Prometheus 具有以下特点:
多维数据模型:由度量名称和键值对标识的时间序列数据
基于HTTP的pull方式采集时间序列数据
PromSQL:一种灵活的查询语言,可以利用多维数据完成复杂的查询
不依赖分布式存储,单个服务器节点可直接工作,单机性能:可以达到每秒消费百万级时间序列,同时支持采集上千个targets
通过服务发现或静态配置发现目标
社区生态丰富(多语言,各种exportrs)、开源社区活跃36k关注度
附:时间序列数据库发展趋势
Prometheus 可以简单归类两个部分,一个是监控报警系统,另一个是自带的时序数据库(TSDB)。从上图时序数据库发展趋势也可以看出采用Prometheus作为时序数据库使用场景也是多的。
其中组件如下:
Prometheus Server: 用于收集和存储时间序列数据,也是Prometheus中的核心模块。Retrieval模块用来定时拉取监控target数据、Storage时序数据库保存数据,PromQL 提供的查询语法, 通过解析语法树,查询 Storage 模块获取监控数据。另外除了 Prometheus 自带的 webui,还可以通过 grafana 、API 等方式查询 Prometheus 监控数据。
Push Gateway: 主要用于短期的 jobs。由于这类 jobs 存在时间较短,可能在 Prometheus来pull 之前就消失了。为此,这类jobs可以直接向 Prometheus server 端推送它们的 metrics。这种方式主要用于服务层面的metrics,对于机器层面的 metrices,需要使用 node exporter。
Exporters: 用于暴露已有的第三方服务的 metrics 给 Prometheus。
Alertmanager: 从 Prometheus server 端接收到 alerts 后,会对告警进行去重,分组,并路由发送到对应的接收配置中。常见的接收方式有:电子邮件,手机,企业微信, webhook 等。
最上面Prometheus提供服务发现, 支持监控对象的自动发现机制,从而可以动态获取监控对象。
从最原始的抓取数据上来看,如下图所示,timestamp是当前抓取时间戳:
每个Metric name代表了一类的指标,他们可以携带不同的Labels,每个Metric name + Label组合成代表了一条时间序列的数据也就是一个样本,由三部分组织:
指标(metric):metric name和描述当前样本特征的labelsets;
时间戳(timestamp):一个精确到毫秒的时间戳;
样本值(value):一个float64的浮点型数据表示当前样本的值。
Counter(计数器):计数统计,累计多长或者累计多少次等。它的特点是只增不减,譬如 HTTP 访问总量;
Gauge(仪表盘):侧重反应系统的当前状态,这类指标的样本数据可增可减,如果当前内存用量,它随着时间变化忽高忽低。
除了Counter和Gauge类型的监控指标以外,Prometheus还定义了Histogram和Summary的指标类型。Histogram和Summary主用用于统计和分析样本的分布情况。
Histogram(直方图):用于表示一段时间的数据采样结果,Histogram 在一段时间范围内对数据进行采样(通常是请求持续时间或响应大小等),并将其计入可配置的存储桶(bucket)中,后续可通过指定区间筛选样本,也可以统计样本总数,最后一般将数据展示为直方图。
Summary(摘要):与Histogram类似,用于表示一段时间的数据采样结果,但它直接存储了分数位,(通过客户端计算,然后展示出来)而不是通过区间来计算。
Sumamry的分位数则是直接在客户端计算完成。因此对于分位数的计算而言,Summary在通过PromQL进行查询时有更好的性能表现,而Histogram则会消耗更多的资源。反之对于客户端而言Histogram消耗的资源更少。
除了上面介绍了还有其他各个服务组件具体细节,这些内容可以参考其他资料或者从本文参考文档找到相关的内容介绍及使用方式,这里不做详细介绍了。
基于 Metric 的监控,不适用于日志(Logs)、事件(Event)、调用链(Tracing),Prometheus 只针对性能和可用性监控,并不具备日志监控等功能,并不能通过 Prometheus 解决所有监控问题,日志监控还是要结合日志采集如Fluented、日志存储如ElasticSearch等配合去使用,特征平台在实践过程中采取了EFK(ElasticSearch、Fluented、Kibana)对k8s容器日志进行了收集,这块会有另外的文章进行介绍。
Prometheus 默认是 Pull 模型,依赖于拉取周期,实施性不那么强,监控的Target多了后,需要合理规划你的网络,尽量不要转发。
Prometheus 数据有效期问题,通常认为最近的监控数据才有查询的需要, Prometheus 本地存储的设计初衷只是保持短期(一个月)的数据,并非针对大量的历史数据的存储。如果需要报表之类的历史数据,则建议使用 Prometheus 的远端存储如 OpenTSDB、m3db 等。
相关详细问题请查看参考文档。
特征平台中的Flink、Zeppelin、ElasticSearch日志收集等组件都是跑在Kubernetes,它是大数据内部特征平台项目组基于 Kubernetes 搭建的容器管理平台,国内机器是部署在华为云上,镜像是采用的华为云私有仓库。其中监控和告警都是基于 Prometheus 构建。整个k8s监控告警体系如下所示:
Prometheus 采集的数据包括了主机性能监控(Node Exporter)、容器性能监控(CAdvisor)、Kubernetes 集群状态(kube-state-metrics)、另外部署了ElasticSearch Exporter 。同时Prometheus配置了相关性能指标的告警,并将告警发送到alertmanager ,alertmanager中配置了飞书的webhook告警信息将通过飞书自动机器人发送到运维群中提醒处理,告警的prometheus规则有:如job运行失败、ElasticSearch集群节点挂了、cpu、内存使用的告警、计算节点异常等。
主要采取常规的Prometheus方式进行部署,即Kubernetes 通过 YAML 文件方式部署 Prometheus 的过程,按顺序部署 Prometheus、kube-state-metrics、node-exporter 、Alertmanager以及 Grafana。
Kubernetes Deployment 方式部署 Prometheus 、alertmanager、grafana、kube-state-metrics(获取 Kubernetes 集群的状态)、elastic-exporter(监控自己运维的ElasticSearch集群),Kubernetes DaemonSet 部署 Node exporter并设置污点容忍允许部署到master节点,获取该节点物理机或者虚拟机的监控信息。所有信息汇聚到 Prometheus 进行处理和存储,然后通过 Grafana 进行展示。
首先需要创建 Prometheus 所在命名空间,然后创建 Prometheus 使用的 RBAC 规则,创建 Prometheus 的 configmap 来保存配置文件,创建 service 进行固定集群 IP 访问,创建 deployment 部署带有 Prometheus 容器的 pod。
创建prometheus名的命名空间 之后相关对象都放到该命名空间
创建namespace,文件名ns-promethes.yaml
#创建namespace,文件名ns-promethes.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: prometheus
执行:kubectl apply -f ns-promethes.yaml
创建 RBAC 规则,包含 ServiceAccount、ClusterRole、ClusterRoleBinding 三类 YAML 文件,主要是为ServiceAccount为 prometheus申请访问k8s Apiserver所需要的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
apiGroups: [""]
resources:
nodes
nodes/proxy
services
endpoints
pods
verbs: ["get", "list", "watch"]
apiGroups:
extensions
resources:
ingresses
verbs: ["get", "list", "watch"]
nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
kind: ServiceAccount
name: prometheus
prometheus :
使用 ConfigMap 方式创建 Prometheus 应用配置文件
配置文件里面制定了抓取周期、Alertmanger配置、告警规则文件位置、所有需要抓取的的target:这里可以看到k8s大部分指标收集都是通过k8s服务发现去收集。
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: prometheus
data:
| :
global:
scrape_interval: 15s #抓取周期15s
evaluation_interval: 15s
alerting:
alertmanagers:
static_configs:
targets: ["alertmanager-svc:9093"] #告警svc配置
rule_files:
"/etc/prometheus/rules/rule.yml" #告警规则文件目录
scrape_configs:
job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
action: labelmap
regex: __meta_kubernetes_node_label_(.+)
target_label: __address__
replacement: kubernetes.default.svc:443
source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
action: labelmap
regex: __meta_kubernetes_node_label_(.+)
target_label: __address__
replacement: kubernetes.default.svc:443
source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
role: endpoints
relabel_configs:
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
action: labelmap
regex: __meta_kubernetes_service_label_(.+)
source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
job_name: 'kubernetes-services'
kubernetes_sd_configs:
role: service
metrics_path: /probe
params:
module: [http_2xx]
relabel_configs:
source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
source_labels: [__address__]
target_label: __param_target
target_label: __address__
replacement: blackbox-exporter.example.com:9115
source_labels: [__param_target]
target_label: instance
action: labelmap
regex: __meta_kubernetes_service_label_(.+)
source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
job_name: 'kubernetes-ingresses'
kubernetes_sd_configs:
role: ingress
relabel_configs:
source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
target_label: __address__
replacement: blackbox-exporter.example.com:9115
source_labels: [__param_target]
target_label: instance
action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
job_name: 'kubernetes-pods'
kubernetes_sd_configs:
role: pod
relabel_configs:
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
job_name: 'node-exporter'
kubernetes_sd_configs:
role: endpoints
relabel_configs:
source_labels: [__meta_kubernetes_endpoints_name]
regex: 'node-exporter'
action: keep
使用 ConfigMap 方式创建 Prometheus 告警配置文件
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-rule-config
namespace: prometheus
data:
| :
groups:
name: kubernetes
rules:
alert: PodDown
expr: kube_pod_status_phase{phase="Unknown"} == 1 or kube_pod_status_phase{phase="Failed"} == 1
for: 1m
labels:
severity: error
annotations:
summary: Pod Down
description: " pod: {{ $labels.pod }} namespace :{{ $labels.namespace }}"
alert: PodRestart
expr: changes(kube_pod_container_status_restarts_total{pod !~ "analyzer.*"}[10m]) > 0
for: 1m
labels:
severity: error
annotations:
summary: Pod Restart
description: "pod:{{ $labels.pod }} namespace : {{ $labels.namespace }} restart"
alert: NodeUnschedulable
expr: kube_node_spec_unschedulable == 1
for: 5m
labels:
severity: error
annotations:
summary: Node Unschedulable
description: "node: {{ $labels.node }} Unschedulable "
alert: NodeStatusError
expr: kube_node_status_condition{condition="Ready", status!="true"} == 1
for: 5m
labels:
severity: error
annotations:
summary: Node Status Error
description: "node: {{ $labels.node }} Status Error "
alert: DaemonsetUnavailable
expr: kube_daemonset_status_number_unavailable > 0
for: 5m
labels:
severity: error
annotations:
summary: "Daemonset Unavailable"
description: "Daemonset {{ $labels.daemonset }} with namespace {{ $labels.namespace }} Unavailable"
alert: JobFailed
expr: kube_job_status_failed == 1
for: 5m
labels:
severity: error
annotations:
summary: "Job Failed"
description: "实例job : {{ $labels.job_name }} namespace :{{ $labels.namespace }} 运行失败"
name: elasticsearch
rules:
record: elasticsearch_filesystem_data_used_percent
expr: 100 * (elasticsearch_filesystem_data_size_bytes - elasticsearch_filesystem_data_free_bytes)
elasticsearch_filesystem_data_size_bytes
record: elasticsearch_filesystem_data_free_percent
expr: 100 - elasticsearch_filesystem_data_used_percent
alert: ElasticsearchTooFewNodesRunning
expr: elasticsearch_cluster_health_number_of_nodes < 3
for: 5m
labels:
severity: critical
annotations:
description: "There are only {{$value}} < 3 ElasticSearch nodes running"
summary: ElasticSearch running on less than 3 nodes
alert: ElasticsearchHeapTooHigh
expr: elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}
0.9
for: 15m
labels:
severity: critical
annotations:
description: The heap usage is over 90% for 15m
summary: ElasticSearch node {{$labels.node}} heap usage is high
alert: 机器宕机
expr: up{component="node-exporter"} != 1
for: 1m
labels:
severity: "warning"
instance: "{{ $labels.instance }}"
annotations:
summary: "机器 {{ $labels.instance }} 处于down的状态"
description: "{{ $labels.instance }} of job {{ $labels.job }} 已经处于down状态超过1分钟,请及时处理"
alert: cpu 剩余量过低
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 1m
labels:
severity: "warning"
instance: "{{ $labels.instance }}"
annotations:
summary: "机器 {{ $labels.instance }} cpu 已用超过设定值"
description: "{{ $labels.instance }} CPU 用量已超过 85% (current value is: {{ $value }}),请及时处理。
这里的告警规则:简单的配置了k8s Pod挂了、PodRestart、节点不可用、DaemonsetUnavailable不可用、elasticsearch等。其他需要的话 还可以按照需要进行添加
deployment 方式创建 prometheus 实例
这里面主要注意几处配置:数据卷挂载可以配置一个hostpath以免pod挂了后数据丢失,另外就是configmap挂载的告警规则和prometheus的配置。
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-dep
namespace: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-dep
template:
metadata:
labels:
app: prometheus-dep
spec:
containers:
image: prom/prometheus
name: prometheus
command:
"/bin/prometheus"
args:
"--config.file=/etc/prometheus/prometheus.yml"
"--storage.tsdb.path=/prometheus"
"--storage.tsdb.retention=3d"
# - "--web.external-url=http://192.168.106.41:30090/" 这块如果监控告警需要连接到prometheus ui可以配置外部可访问的地址
ports:
containerPort: 9090
protocol: TCP
volumeMounts:
mountPath: "/prometheus"
name: data
mountPath: "/etc/prometheus"
name: config-volume
mountPath: "/etc/prometheus/rules"
name: rule-config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
serviceAccountName: prometheus
volumes:
name: data
emptyDir: {}
name: config-volume
configMap:
name: prometheus-config
name: rule-config-volume
configMap:
name: prometheus-server-rule-config
部署prometheus svc 类型为NodePort暴露外部端口30090
主要作用把k8s prometheus 集群内的ip和端口 映射到集群节点,可以用节点ip地址和端口供外部去访问。
kind: Service
apiVersion: v1
metadata:
name: prometheus-svc
namespace: prometheus
spec:
type: NodePort
ports:
port: 9090
targetPort: 9090
nodePort: 30090
selector:
prometheus-dep :
k8s Deployment 部署 Alertmanager
包含三部分:Alertmanager的配置以ConfigMap的方式处理、配置外部Service供外部访问、Deployment部署Alertmanager。
### alertmanager-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: prometheus
data:
| :
global:
resolve_timeout: 5m
route:
receiver: feishuhok
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: ['alertname', 'k8scluster', 'node', 'container', 'exported_job', 'daemonset']
routes:
receiver: feishuhok
group_wait: 10s
match:
severity: error
receivers:
name: feishuhok
webhook_configs:
url: 'https://www.feishu.cn/flow/api/trigger-webhook/8e0f266df11-----'
true :
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager-dep
namespace: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager-dep
template:
metadata:
labels:
app: alertmanager-dep
spec:
containers:
image: prom/alertmanager
name: alertmanager
args:
"--config.file=/etc/alertmanager/config.yml"
"--storage.path=/alertmanager"
"--data.retention=72h"
volumeMounts:
mountPath: "/alertmanager"
name: data
mountPath: "/etc/alertmanager"
name: config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
volumes:
name: data
emptyDir: {}
name: config-volume
configMap:
name: alertmanager-config
---
kind: Service
apiVersion: v1
metadata:
name: alertmanager-svc
namespace: prometheus
spec:
type: NodePort
ports:
name: http
port: 9093
nodePort: 31090
selector:
app: alertmanager-dep
grafana部署
主要为了去可视化展示prometheus的监控数据
包含两部分:grafana配置外部Service供外部访问、Deployment部署grafana。
### grafana deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-core
namespace: prometheus
labels:
app: grafana
component: core
spec:
replicas: 1
selector:
matchLabels:
app: grafana
component: core
template:
metadata:
labels:
app: grafana
component: core
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- image: grafana/grafana
name: grafana-core
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
- name: grafana-persistent-storage
mountPath: /var/lib/grafana
serviceAccountName: prometheus
volumes:
- name: grafana-persistent-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: prometheus
labels:
app: grafana
component: core
spec:
type: NodePort
ports:
- port: 3000
nodePort: 31000
selector:
app: grafana
k8s 集群状态收集
主要包含RBAC规则创建、Deployment创建、Service ,这个用来收集集群的相关状态
## k8s 状态收集
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
namespace: prometheus
name: kube-state-metrics-resizer
rules:
apiGroups: [""]
resources:
pods
verbs: ["get"]
apiGroups: ["extensions"]
resources:
deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
apiGroups: [""]
resources:
configmaps
secrets
nodes
pods
services
resourcequotas
replicationcontrollers
limitranges
persistentvolumeclaims
persistentvolumes
namespaces
endpoints
verbs: ["list", "watch"]
apiGroups: ["extensions"]
resources:
daemonsets
deployments
replicasets
verbs: ["list", "watch"]
apiGroups: ["apps"]
resources:
statefulsets
daemonsets
deployments
replicasets
verbs: ["list", "watch"]
apiGroups: ["batch"]
resources:
cronjobs
jobs
verbs: ["list", "watch"]
apiGroups: ["autoscaling"]
resources:
horizontalpodautoscalers
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
kind: ServiceAccount
name: kube-state-metrics
prometheus :
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
kind: ServiceAccount
name: kube-state-metrics
namespace: prometheus
---
apiVersion: apps/v1
# Kubernetes versions after 1.9.0 should use apps/v1
# Kubernetes versions before 1.8.0 should use apps/v1beta1 or extensions/v1beta1
# addon-resizer描述:https://github.com/kubernetes/autoscaler/tree/master/addon-resizer
kind: Deployment
metadata:
name: kube-state-metrics
namespace: prometheus
spec:
selector:
matchLabels:
kube-state-metrics :
replicas: 1
template:
metadata:
labels:
kube-state-metrics :
spec:
serviceAccountName: kube-state-metrics
containers:
name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v2.0.0-alpha.1
ports:
name: http-metrics
containerPort: 8080
name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: prometheus
labels:
kube-state-metrics :
annotations:
'true' :
spec:
ports:
name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
kube-state-metrics :
k8s DaemonSet 部署Node Exporter收集节点数据
这里tolerations这块设置,主要是为了也能够在master部署exporter收集master节点数据
## node exporter
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
containers:
- image: prom/node-exporter
name: node-exporter
ports:
- containerPort: 9100
protocol: TCP
name: http
hostPID: true
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: node-exporter
name: node-exporter
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
spec:
clusterIP: None
ports:
- name: http
port: 9100
protocol: TCP
type: ClusterIP
selector:
k8s-app: node-exporter
部署elasticsearch exporter采集es集群
apiVersion: apps/v1
kind: Deployment
metadata:
name: elastic-exporter
namespace: prometheus
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: elastic-exporter
template:
metadata:
labels:
app: elastic-exporter
spec:
containers:
- command:
- /bin/elasticsearch_exporter
- --es.uri=http://es-svc:9200
- --es.all
image: justwatch/elasticsearch_exporter:1.1.0
securityContext:
capabilities:
drop:
- SETPCAP
- MKNOD
- AUDIT_WRITE
- CHOWN
- NET_RAW
- DAC_OVERRIDE
- FOWNER
- FSETID
- KILL
- SETGID
- SETUID
- NET_BIND_SERVICE
- SYS_CHROOT
- SETFCAP
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /healthz
port: 9114
initialDelaySeconds: 30
timeoutSeconds: 10
name: elastic-exporter
ports:
- containerPort: 9114
name: http
readinessProbe:
httpGet:
path: /healthz
port: 9114
initialDelaySeconds: 10
timeoutSeconds: 10
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 25m
memory: 64Mi
restartPolicy: Always
securityContext:
runAsNonRoot: true
runAsGroup: 10000
runAsUser: 10000
fsGroup: 10000
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: elastic-exporter
name: elastic-exporter
namespace: prometheus
spec:
ports:
- name: http
port: 9114
nodePort: 31200
protocol: TCP
type: NodePort
selector:
app: elastic-exporter
这是整个部署逻辑,对于熟悉k8s 对象操作没太多复杂的地方,无非是配置configmap、编写RBAC规则、创建deployment、daemonset svc对象进行操作,最后可以把这些配置文件写到一个yaml中,执行一条 kubectl apply -f promehteus-all.yaml就可以了。
告警规则
采集target情况
飞书群告警通知
grafana告警监控
本文主要介绍了Prometheus相关知识、进行了相关实际操作,并结合项目介绍了采用Prometheus 打通了监控数据收集、监控指标访问、告警通知到飞书运维群的整个监控链路。实际中对Prometheus的使用还是比较基础,K8S集群集群规模也比较小。集群的负载、存储和访问量挑战还不大。后续还有许多工作需要继续完善,比如面对复杂的场景、资源有限、网络负责,需要配置Prometheus的HA、更丰富的Prometheus监控规则等工作。
另外在设计特征平台是把Flink任务跑在yarn还是K8S上面,我们毫不犹豫的选择了K8S,一个原因社区拥抱K8S 已经是一种趋势,另一个原因是项目中团队成员有好几年的K8S的使用经验,无论是搭建K8S集群还是对K8S的使用都已经是驾轻就熟。总之通过从环境搭建、对K8S集群做高可用,完善集群的日志收集,完善K8S集群、应用状态的监控和告警 到项目最终上线这一系列操作 ,对整个K8S生态理解的更加深刻,同时也通过K8S提升了开发效率。
作者介绍: 张浩(house.zhang) 资深大数据工程师
https://github.com/prometheus/prometheus
DB-Engines Ranking
Prometheus 存储层的演进
https://github.com/yunlzheng/prometheus-book
http://dockone.io/article/5716
https://www.bookstack.cn/read/prometheus-book/AUTHOR.md
https://www.infoq.cn/article/275NDkYNZRpcTIL2R8Ms
https://blog.csdn.net/u013256816/article/details/106152544
https://www.infoq.cn/article/Uj12kNwoRCwG0kke8Zfv