原文:
https://cloud.tencent.com/developer/article/1402436
1 需求分析:
遇到一个需求,要使用prometheus监控多个k8s集群。
调研发现prometheus配合node_exporter、kube-state-metrics可以很方便地采集单个集群的监控指标。因此最初的构想是在每套k8s集群里部署prometheus,由它采集该集群的监控指标,再运用prometheus的联邦模式将多个prometheus中的监控数据聚合采集到一个中心prometheus里来,参考模型为Hierarchical federation 链接:(https://prometheus.io/docs/prometheus/latest/federation/#hierarchical-federation)。
但甲方觉得上述方案中每个k8s集群都要部署prometheus,增加了每套k8s集群的资源开销,希望全局只部署一套prometheus,由它统一采集多个k8s集群的监控指标。尽管个人不太认可这种方案,中心prometheus今后很有可能成为性能瓶颈,但甲方要求的总得尽力满足,下面开始研究如何用一个prometheus采集多个k8s集群的监控指标。
2 步骤
1 prometheus采集当前k8s监控数据
2 prometheus采集其它k8s监控数据
从上述分析来看,假设其它k8s部署了node_exporter和kube-state-metrics,用prometheus采集其它k8s集群的监控数据也是可行的,只需要解决两个问题:
设置好kubernetes_sd_configs,让其可通过其它k8s集群的apiserver发现抓取的endpionts。
设置好relabel_configs,构造出访问其它k8s集群中的service, pod, node等endpoint URL。
3 实施参考
# 假设就部署在default命名空间
helm install --name prometheus --namespace default stable/prometheus
部署完毕之后,由于默认并没有创造任何ingress资源对象,创建的service的类型也仅仅是ClusterIP,所以从集群外部是没法访问到它的,不过可以简单地将端口映射出来,如下:
kubectl -n default port-forward service/prometheus-server 30080:80
这里使用了kubectl port-forward命令,详细使用方法参考这里。
然后用浏览器访问http://127.0.0.1:30080/graph,就可访问到prometheus的WebConsole了。
访问http://127.0.0.1:30080/config可以看到当前prometheus的配置,其中抓取当前k8s集群监控指标的配置如下:
scrape_configs:
- job_name: prometheus
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- localhost:9090
- job_name: kubernetes-apiservers
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;kubernetes;https
replacement: $1
action: keep
- job_name: kubernetes-nodes
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: node
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
- job_name: kubernetes-nodes-cadvisor
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: node
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
action: replace
- job_name: kubernetes-service-endpoints
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
separator: ;
regex: (https?)
target_label: __scheme__
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
separator: ;
regex: ([^:]+)(?::\d+)?;(\d+)
target_label: __address__
replacement: $1:$2
action: replace
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: kubernetes_name
replacement: $1
action: replace
- job_name: prometheus-pushgateway
honor_labels: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
separator: ;
regex: pushgateway
replacement: $1
action: keep
- job_name: kubernetes-services
params:
module:
- http_2xx
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /probe
scheme: http
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__address__]
separator: ;
regex: (.*)
target_label: __param_target
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: __address__
replacement: blackbox
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: kubernetes_name
replacement: $1
action: replace
- job_name: kubernetes-pods
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
separator: ;
regex: ([^:]+)(?::\d+)?;(\d+)
target_label: __address__
replacement: $1:$2
action: replace
- separator: ;
regex: __meta_kubernetes_pod_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: kubernetes_pod_name
replacement: $1
action: replace
首先是一小段prometheus的抓取配置,官方解释在这里,这个比较简单,就不具体解释了
- job_name: kubernetes-service-endpoints
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
然后说明是用k8s的发现机制去发现抓取地址的:
kubernetes_sd_configs:
- role: endpoints
prometheus里k8s的发现机制配置见这里,注意这里没填api_server属性,因此用了默认值kubernetes.default.svc。
然后是一段relabel_configs配置,其作用主要是用于匹配最终要抓取的endpoint,构造抓取的地址,甚至给最终的时序指标加上一些label。这里以apiserver发现的node_exporter endpoint信息为例,在没有relabel之前,其endpoint信息如下:
。。。略
具体参考原文:https://cloud.tencent.com/developer/article/1402436
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)