当前位置: 首页 > news >正文

监控k8s controller和scheduler,创建serviceMonitor以及Rules

目录

一、修改kube-controller和kube-schduler的yaml文件

二、创建service、endpoint、serviceMonitor

三、Prometheus验证

四、创建PrometheusRule资源

五、Prometheus验证


直接上干货

一、修改kube-controller和kube-schduler的yaml文件

注意:修改时要一个节点一个节点的修改,等上一个修改的节点服务正常启动后再修改下个节点

kube-controller文件路径:/etc/kubernetes/manifests/kube-controller-manager.yaml
kube-scheduler文件路径:/etc/kubernetes/manifests/kube-scheduler.yamlvim /etc/kubernetes/manifests/kube-controller-manager.yaml
vim /etc/kubernetes/manifests/kube-scheduler.yaml

二、创建service、endpoint、serviceMonitor

kube-controller-monitor.yaml

apiVersion: v1
kind: Service
metadata:labels:k8s-app: kube-controller-managername: kube-controller-manage-monitornamespace: kube-system
spec:ports:- name: https-metricsport: 10257protocol: TCPtargetPort: 10257sessionAffinity: Nonetype: ClusterIP
--- 
apiVersion: v1
kind: Endpoints
metadata:labels:k8s-app: kube-controller-managername: kube-controller-manage-monitornamespace: kube-system
subsets:
- addresses:- ip: 10.50.238.191- ip: 10.50.107.48- ip: 10.50.140.151ports:- name: https-metricsport: 10257protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:labels:k8s-app: kube-controller-managername: kube-controller-managernamespace: kube-system
spec:endpoints:- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/tokeninterval: 30sport: https-metricsscheme: httpstlsConfig:insecureSkipVerify: truejobLabel: k8s-appnamespaceSelector:matchNames:- kube-systemselector:matchLabels:k8s-app: kube-controller-manager

kube-scheduler-monitor.yaml

apiVersion: v1
kind: Service
metadata:labels:k8s-app: kube-schedulername: kube-scheduler-monitornamespace: kube-system
spec:ports:- name: https-metricsport: 10259protocol: TCPtargetPort: 10259sessionAffinity: Nonetype: ClusterIP
--- 
apiVersion: v1
kind: Endpoints
metadata:labels:k8s-app: kube-schedulername: kube-scheduler-monitornamespace: kube-system
subsets:
- addresses:- ip: 10.50.238.191- ip: 10.50.107.48- ip: 10.50.140.151ports:- name: https-metricsport: 10259protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:labels:k8s-app: kube-schedulername: kube-schedulernamespace: kube-system
spec:endpoints:- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/tokeninterval: 30sport: https-metricsscheme: httpstlsConfig:insecureSkipVerify: truejobLabel: k8s-appnamespaceSelector:matchNames:- kube-systemselector:matchLabels:k8s-app: kube-scheduler

root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/serviceMonitor/kubernetes-cluster# kubectl  apply -f ./
service/kube-controller-manage-monitor created
endpoints/kube-controller-manage-monitor created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
service/kube-scheduler-monitor created
endpoints/kube-scheduler-monitor created
servicemonitor.monitoring.coreos.com/kube-scheduler created

三、Prometheus验证

四、创建PrometheusRule资源

kube-controller-rules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:annotations:meta.helm.sh/release-namespace: cattle-monitoring-systemprometheus-operator-validated: "true"generation: 3labels:app: rancher-monitoringapp.kubernetes.io/instance: rancher-monitoringapp.kubernetes.io/part-of: rancher-monitoringname: kube-controller-managernamespace: cattle-monitoring-system
spec:groups:- name: kube-controller-manager.rulerules:- alert: K8SControllerManagerDownexpr: absent(up{job="kube-controller-manager"} == 1)for: 1mlabels:severity: criticalcluster: manage-prodannotations:description: There is no running K8S controller manager. Deployments and replication controllers are not making progress.summary: No kubernetes controller manager are reachable- alert: K8SControllerManagerDownexpr: up{job="kube-controller-manager"} == 0for: 1mlabels:severity: warningcluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is down. {{ $labels.instance }} isn't reachablesummary: kubernetes controller manager is down- alert: K8SControllerManagerUserCPUexpr: sum(rate(container_cpu_user_seconds_total{pod=~"kube-controller-manager.*",container_name!="POD"}[5m]))by(pod) > 5for: 5mlabels:severity: warningcluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is user cpu time > 5s. {{ $labels.instance }} isn't reachablesummary: kubernetes controller 负载较高超过5s- alert: K8SControllerManagerUseMemoryexpr: sum(rate(container_memory_usage_bytes{pod=~"kube-controller-manager.*",container_name!="POD"}[5m])/1024/1024)by(pod) > 20for: 5mlabels:severity: infocluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is use memory More than 20MBsummary: kubernetes controller 使用内存超过20MB- alert: K8SControllerManagerQueueTimedelayexpr: histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job="kubernetes-controller-manager"}[5m])) by(le)) > 10for: 5mlabels:severity: warningcluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is QueueTimedelay More than 10ssummary: kubernetes controller 队列停留时间超过10秒,请检查ControllerManager

kube-scheduler-rules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:annotations:meta.helm.sh/release-namespace: cattle-monitoring-systemprometheus-operator-validated: "true"generation: 3labels:app: rancher-monitoringapp.kubernetes.io/instance: rancher-monitoringapp.kubernetes.io/part-of: rancher-monitoringname: kube-schedulernamespace: cattle-monitoring-system
spec:groups:- name: kube-scheduler.rulerules:- alert: K8SSchedulerDownexpr: absent(up{job="kube-scheduler"} == 1)for: 1mlabels:severity: criticalcluster: manage-prodannotations:description: "There is no running K8S scheduler. New pods are not being assigned to nodes."summary: "all k8s scheduler is down"- alert: K8SSchedulerDownexpr: up{job="kube-scheduler"} == 0for: 1mlabels:severity: warningcluster: manage-prodannotations:description: "K8S scheduler {{ $labels.instance }} is no running. New pods are not being assigned to nodes."summary: "k8s scheduler {{ $labels.instance }} is down"- alert: K8SSchedulerUserCPUexpr: sum(rate(container_cpu_user_seconds_total{pod=~"kube-scheduler.*",container_name!="POD"}[5m]))by(pod) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: "kubernetes scheduler {{ $labels.instance }} is user cpu time > 1s. {{ $labels.instance }} isn't reachable"summary: "kubernetes scheduler 负载较高超过1s,当前值为{{$value}}"- alert: K8SSchedulerUseMemoryexpr: sum(rate(container_memory_usage_bytes{pod=~"kube-scheduler.*",container_name!="POD"}[5m])/1024/1024)by(pod) > 20for: 5mlabels:severity: infocluster: manage-prodannotations:current_value: '{{$value}}'description: "kubernetess scheduler {{ $labels.instance }} is use memory More than 20MB"summary: "kubernetes scheduler 使用内存超过20MB,当前值为{{$value}}MB"- alert: K8SSchedulerPodPendingexpr: sum(scheduler_pending_pods{job="kubernetes-scheduler"})by(queue) > 5for: 5mlabels:severity: infocluster: manage-prodannotations:current_value: '{{$value}}'description: "kubernetess scheduler {{ $labels.instance }} is Pending pod More than 5"summary: "kubernetes scheduler pod无法调度 > 5,当前值为{{$value}}"- alert: K8SSchedulerPodPendingexpr: sum(scheduler_pending_pods{job="kubernetes-scheduler"})by(queue) > 10for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }} is Pending pod More than 10summary: "kubernetes scheduler pod无法调度 > 10,当前值为{{$value}}"- alert: K8SSchedulerPodPendingexpr: sum(rate(scheduler_binding_duration_seconds_count{job="kubernetes-scheduler"}[5m])) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }}summary: "kubernetes scheduler pod 无法绑定调度有问题,当前值为{{$value}}"- alert: K8SSchedulerVolumeSpeedexpr: sum(rate(scheduler_volume_scheduling_duration_seconds_count{job="kubernetes-scheduler"}[5m])) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }}summary: "kubernetes scheduler pod Volume 速度延迟,当前值为{{$value}}"- alert: K8SSchedulerClientRequestSlowexpr: histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job="kubernetes-scheduler"}[5m])) by (verb, url, le)) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }}summary: "kubernetes scheduler 客户端请求速度延迟,当前值为{{$value}}"
root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/rules# kubectl  apply -f kube-controller-rules.yaml 
prometheusrule.monitoring.coreos.com/kube-apiserver-rules configured
root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/rules# kubectl  apply -f kube-scheduler-rules.yaml 
prometheusrule.monitoring.coreos.com/kube-apiserver-rules configured
root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/rules# 

五、Prometheus验证

http://www.lryc.cn/news/263517.html

相关文章:

  • 支持向量机 支持向量机概述
  • http -- 跨域问题详解(浏览器)
  • Java对接腾讯多人音视频房间回调接口示例
  • vp与vs联合开发-通过FrameGrabber连接相机
  • 音视频直播核心技术介绍
  • JNDI注入Log4jFastJson白盒审计不回显处理
  • FPGA实现腐蚀和膨胀算法verilog设计及仿真 加报告
  • 核和值域的关系:什么是矩阵的秩?
  • 【MyBatis Plus】Service Mapper内置接口讲解
  • 制作一个简单 的maven plugin
  • 基于linux系统的Tomcat+Mysql+Jdk环境搭建(三)centos7 安装Tomcat
  • Ubuntu环境下SomeIP/CommonAPI环境搭建详细步骤
  • maven 项目导入异常问题
  • 在 VMware 虚拟机上安装黑苹果(Hackintosh):免费 macOS ISO 镜像下载及安装教程
  • 国产ToolLLM的课代表---OpenBMB机构(清华NLP)旗下ToolBench的安装部署与运行(附各种填坑说明)
  • 串口通信(5)-C#串口通信数据接收不完整解决方案
  • 大数据分析岗是干什么的?
  • hadoop运行jar遇到的一个报错
  • 长短期记忆(LSTM)神经网络-多输入分类
  • 开启创意之旅:免费、开源的噪波贴图(noise texture)生成网站——noisecreater.com详细介绍
  • Android Studio问题解决:Gradle Download 下载超时 Connect reset
  • 【Python百宝箱】云上翱翔:Python编程者的AWS奇妙之旅
  • 抖音直播间websocket礼物和弹幕消息推送可能出现重复的情况,解决办法
  • 【设计模式--行为型--访问者模式】
  • [最后一个月征稿、ACM独立出版】第三届密码学、网络安全和通信技术国际会议(CNSCT 2024)
  • android —— PopupWindow
  • mysql部署 --(docker)
  • 基于多智能体系统一致性算法的电力系统分布式经济调度策略MATLAB程序
  • Android : SensorManager 传感器入门 简单应用
  • 《点云处理》 点云去噪