-
k8s环境安装:
https://blog.csdn.net/weixin_43606975/article/details/119947061?spm=1001.2014.3001.5502
-
部署prometheus-v0.10.0
下载地址:
https://github.com/prometheus-operator/kube-prometheus/tags
3. 上传到服务器
tar -xf v0.10.0.tar.gz
4.修改replicas为1不然机器没有那么大的资源
cd /home/k8s/kube-prometheus-0.10.0/manifests
grep -r "replicas: 2" *
grep -r "replicas: 3" *
5.增加service的nodeport
vim alertmanager-service.yaml
vim prometheus-service.yaml
vim grafana-service.yaml
6.启动
kubectl create -f /home/k8s/kube-prometheus-0.10.0/manifests/setup/
kubectl apply -f /home/k8s/kube-prometheus-0.10.0/manifests/
7.因为pod镜像无法拉取下来,所以一直报错。修改镜像
docker tag bitnami/kube-state-metrics:latest k8s.qcr.io/kube-state-metrics/kube-state-metrics:v2.3.0
7.访问几个nodeport的端口 Grafana admin/admin
8.promql基本语法
9.安装钉钉报警,自定义监控
cat dingtalk-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dingtalk-config
namespace: prometheus
data:
config.yml: |-
templates:
- /etc/prometheus-webhook-dingtalk/template.tmpl
targets:
webhook:
url: https://oapi.dingtalk.com/robot/send?access_token=b5b550b72447d935572d5c717cd1ec4bed7f17cc82efaa
secret: SECcbc9fe62f53d9a533d5e506f30722e0a1a39b36bd0b8e24
mention:
all: true
webhook2:
url: https://oapi.dingtalk.com/robot/send?access_token=4df2745e8df1de6d0429e35caf15e03
secret: SECe079af795abd316a7e1f431ee8ebcf082cc0b0611a859da
template.tmpl: |-
{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/
{{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
{{ end }}{{ end }}
{{ define "default.__text_alert_list" }}{{ range . }}
---
**告警级别:** {{ .Labels.severity | upper }}
**运营团队:** {{ .Labels.team | upper }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**事件信息:**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{ define "default.__text_alertresovle_list" }}{{ range . }}
---
**告警级别:** {{ .Labels.severity | upper }}
**运营团队:** {{ .Labels.team | upper }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**结束时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
**事件信息:**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{/* Default */}}
{{ define "default.title" }}{{ template "__subject" . }}{{ end }}
{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 -}}
{{ template "default.__text_alert_list" .Alerts.Firing }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
{{ template "default.__text_alertresovle_list" .Alerts.Resolved }}
{{- end }}
{{- end }}
{{/* Legacy */}}
{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
{{ define "legacy.content" }}
{{ template "__text_alert_list" .Alerts.Firing }}
{{- end }}
{{/* Following names for compatibility */}}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
cat dingtalk-deployment.yaml
apiVersion: v1
kind: Service
metadata:
name: dingtalk
namespace: monitoring
labels:
app: dingtalk
annotations:
prometheus.io/scrape: 'false'
spec:
selector:
app: dingtalk
ports:
- name: dingtalk
port: 8060
protocol: TCP
targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dingtalk
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: dingtalk
template:
metadata:
name: dingtalk
labels:
app: dingtalk
spec:
containers:
- name: dingtalk
image: timonwong/prometheus-webhook-dingtalk:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8060
volumeMounts:
- name: config
mountPath: /etc/prometheus-webhook-dingtalk
volumes:
- name: config
configMap:
name: dingtalk-config
10.启动
kubectl apply -f dingtalk-config.yaml -f dingtalk-deployment.yaml
kubectl get pod -n monitoring
11.配置alertmanager-secret
apiVersion: v1
kind: Secret
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.23.0
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yaml: |-
"global":
"resolve_timeout": "5m"
"receivers":
- "name": "Webhook"
"webhook_configs":
- "url": "http://dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook/send"
"route":
"group_by":
- "namespace"
"group_wait": "30s"
"receiver": "Webhook"
"repeat_interval": "2m"
"routes":
- "matchers":
- "alertname = Webhook"
"receiver": "Webhook"
type: Opaque
12.启动
kubectl apply -f alertmanager-secret.yaml
可以去pod里面看看配置是否刷上去了
13.自定义监控模板
/home/k8s/kube-prometheus-0.10.0/manifests
vim nodeExporter-prometheusRule.yaml
...
- alert: demon-pod
annotations:
description: filed demon-pod < 2
expr: sum(node_namespace_pod:kube_pod_info:{namespace="demon"}) < 2
for: 2m
labels:
team: pods
severity: critical
- alert: Node内存可用大小10兆
expr: node_memory_MemFree_bytes > 10
for: 2m
labels:
severity: critical
team: pods
annotations:
description: 容器可用内存小于100k
...
意思是demon名称空间下,有2个pod如果小于2个就触发报警。
新增文件
14.更新
kubectl apply -f nodeExporter-prometheusRule.yaml
进pod看配置是否刷进去了
15.在Prometheus上查看是否有报警
你可以去停止一个demon的pod然后触发报警,看钉钉是否有消息推送。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)