当前位置: 首页 > news >正文

K8S云原生监控方案Prometheus+grafana

目录

1. 概述

1.1 系统架构

1.1.1 架构图

​编辑

1.2 环境准备

2. 部署prometheus

2.1 创建Namespace

2.2 创建ConfigMap资源

2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim

3. 部署Node_exporter组件

3.1 创建Daemonsets资源

4. 部署Kube_state_metrics组件

4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service

5. 部署Grafana可视化平台

5.1 创建PersistentVolumeClaim,Deployment,Service

6. 部署命令

7. 访问服务

8. grafana仪表盘展示

8.1 为grafana配置数据源

8.2 导入仪表盘

8.3 仪表盘展示


1. 概述

Prometheus是一个开源的监控和告警系统,特别适合云原生环境。本文将详细介绍如何在Kubernetes集群中部署一个完整的Prometheus监控系统,包括Prometheus Server、Node Exporter、Kube-state-metrics和Grafana等组件。

1.1 系统架构

Prometheus监控系统包含以下组件:

  • Prometheus Server: 核心监控服务器,负责数据采集和存储

  • Node Exporter: 节点级指标收集器

  • Kube-state-metrics: Kubernetes集群状态指标收集器

  • Grafana: 数据可视化和仪表板

1.1.1 架构图

1.2 环境准备

IP主机名备注
192.168.48.11master1master节点,k8s1.32.7
192.168.48.12master2master节点,k8s1.32.7
192.168.48.13master3master节点,k8s1.32.7
192.168.48.14node01node节点,k8s1.32.7
192.168.48.15node02noder节点,k8s1.32.7
192.168.48.16node03node节点,k8s1.32.7
192.168.48.19databaseharbor仓库,nfs服务器

本次使用k8s高可用集群,且部署均采用国内镜像,即使没有harbor仓库也能正常部署,如果镜像拉取超时,请在评论区留言,博主一定及时补。nfs服务器一定要有,如果其他存储方案如ceph,hostpath等自行更改yaml文件配置。

k8s搭建nfs共享存储参考往期博客:

k8s搭建nfs共享存储

k8s集群搭建参考往期博客:

openeuler24.03部署k8s1.32.7集群(一主两从)

k8s高可用集群搭建参考往期博客:

openeuler24.03部署k8s1.32.7高可用集群(三主三从)

2. 部署prometheus

2.1 创建Namespace

vim prometheus-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:name: monitorlabels:name: monitorpurpose: monitoring

2.2 创建ConfigMap资源

vim prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: prometheus-confignamespace: monitor
data:prometheus.yml: |global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:# 采集 Prometheus 自身- job_name: 'prometheus'kubernetes_sd_configs:- role: endpointsnamespaces:names: [monitor]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: prometheus-svcaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: webaction: keep
​# 采集 CoreDNS- job_name: 'coredns'kubernetes_sd_configs:- role: endpointsnamespaces:names: [kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kube-dnsaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: metricsaction: keep
​# 采集 kube-apiserver- job_name: 'kube-apiserver'scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: falsebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: endpointsnamespaces:names: [default, kube-system]relabel_configs:- source_labels: [__meta_kubernetes_service_name]regex: kubernetesaction: keep- source_labels: [__meta_kubernetes_endpoint_port_name]regex: httpsaction: keep
​# 采集 node-exporter- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__action: replace
​# 采集 cadvisor- job_name: 'cadvisor'kubernetes_sd_configs:- role: nodescheme: httpstls_config:insecure_skip_verify: trueca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'relabel_configs:- target_label: __metrics_path__replacement: /metrics/cadvisor

2.3 创建ServiceAccount,Clusterrole,Clusterrolebinding,Service,Deployment,ingress,persistentVolumeClaim

vim prometheus.yaml
#创建SA
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: monitor---
#创建clusterrole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups:- ""resources:- nodes- services- endpoints- pods- nodes/proxy- nodes/proxyverbs:- get- list- watch
- apiGroups:- "extenstions"resources:- ingressesverbs:- get- list- watch
- apiGroups:- ""resources:- configmaps- nodes/metricsverbs:- get
- nonResourceURLs:- /metricsverbs:- get---
#创建clusterrolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: monitor---
#创建svc
apiVersion: v1
kind: Service
metadata:name: prometheus-svcnamespace: monitorlabels:app: prometheusannotations:prometheus_io_scrape: "true"  # 注解,有这个才可以被Prometheus发现
spec:selector:app: prometheustype: NodePortports:- name: webnodePort: 32224port: 9090targetPort: http---
#创建ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: prometheus-ingressnamespace: monitor
spec:ingressClassName: nginxrules:- host: www.myprometheus.comhttp:paths:- path: /pathType: Prefixbackend:service:name:  prometheus-svcport:number: 9090
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: prometheus-pvc  # PVC 名称namespace: monitor
spec:accessModes:- ReadWriteOnce  # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany)resources:requests:storage: 2Gi  # 请求的存储容量storageClassName: nfs-client  # 指定 StorageClass(根据集群环境调整)
---
#创建deployment
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheusnamespace: monitorlabels:app: prometheus
spec:selector:matchLabels:app: prometheusreplicas: 1template:metadata:labels:app: prometheusspec:serviceAccountName: prometheusinitContainers:- name: "change-permission-of-directory"image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/busybox:latestcommand: ["/bin/sh"]args: ["-c","chown -R 65534:65534 /prometheus"]securityContext:privileged: truevolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: datacontainers:- image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/prom/prometheus:latestname: prometheusargs:- "--config.file=/etc/prometheus/prometheus.yml"#指定prometheus配置文件路径- "--storage.tsdb.path=/prometheus"#指定tsdb数据库存储路径- "--web.enable-lifecycle"#允许热更新,curl localhost:9090/-/reload 进行热更新- "--web.console.libraries=/usr/share/prometheus/console_libraries"- "--web.console.templates=/usr/share/prometheus/consoles"ports:- containerPort: 9090name: httpvolumeMounts:- mountPath: "/etc/prometheus"name: config-volume- mountPath: "/prometheus"name: dataresources:requests:cpu: 100mmemory: 512Milimits:cpu: 100mmemory: 512Mivolumes:- name: datapersistentVolumeClaim:claimName: prometheus-pvc- configMap:name: prometheus-configname: config-volume
​

3. 部署Node_exporter组件

3.1 创建Daemonsets资源

vim node-exportet-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:matchLabels:app: node-exportertemplate:metadata:labels:app: node-exporterspec:hostPID: truehostIPC: truehostNetwork: truenodeSelector:kubernetes.io/os: linuxcontainers:- name: node-exporterimage: docker.io/prom/node-exporter:latestargs:- --web.listen-address=$(HOSTIP):9100- --path.procfs=/host/proc- --path.sysfs=/host/sys- --path.rootfs=/host/root- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ports:- containerPort: 9100env:- name: HOSTIPvalueFrom:fieldRef:fieldPath: status.hostIPresources:requests:cpu: 150mmemory: 180Milimits:cpu: 150mmemory: 180MisecurityContext:runAsNonRoot: truerunAsUser: 65534volumeMounts:- name: procmountPath: /host/proc- name: sysmountPath: /host/sys- name: rootmountPath: /host/rootmountPropagation: HostToContainerreadOnly: truetolerations:- operator: "Exists"volumes:- name: prochostPath:path: /proc- name: devhostPath:path: /dev- name: syshostPath:path: /sys- name: roothostPath:path: /
​

创建Service

vim node-exportet-svc.yaml
apiVersion: v1
kind: Service
metadata:name: node-exporternamespace: monitorlabels:app: node-exporter
spec:selector:app: node-exporterports:- name: metricsport: 9100targetPort: 9100clusterIP: None  # Headless Service(直接通过 Pod IP 访问)

4. 部署Kube_state_metrics组件

4.1 创建ServiceAccount,ClusterRole,ClusterRoleBinding,Deployment,Service

kube-state-metrics.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:name: kube-state-metricsnamespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: kube-state-metrics
rules:
- apiGroups: [""]resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]verbs: ["list", "watch"]
- apiGroups: ["extensions"]resources: ["daemonsets", "deployments", "replicasets"]verbs: ["list", "watch"]
- apiGroups: ["apps"]resources: ["statefulsets"]verbs: ["list", "watch"]
- apiGroups: ["batch"]resources: ["cronjobs", "jobs"]verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]resources: ["horizontalpodautoscalers"]verbs: ["list", "watch"]
- apiGroups: ["networking.k8s.io"]resources: ["ingresses"]verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: kube-state-metrics
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metrics
subjects:
- kind: ServiceAccountname: kube-state-metricsnamespace: monitor
---
apiVersion: apps/v1
kind: Deployment
metadata:name: kube-state-metricsnamespace: monitor
spec:replicas: 1selector:matchLabels:app: kube-state-metricstemplate:metadata:labels:app: kube-state-metricsspec:serviceAccountName: kube-state-metricscontainers:- name: kube-state-metricsimage: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2imagePullPolicy: IfNotPresentports:- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:annotations:prometheus.io/scrape: 'true'name: kube-state-metricsnamespace: monitorlabels:app: kube-state-metrics
spec:ports:- name: kube-state-metricsport: 8080protocol: TCPselector:app: kube-state-metrics

5. 部署Grafana可视化平台

5.1 创建PersistentVolumeClaim,Deployment,Service

vim grafana.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: grafana-pvc  # PVC 名称namespace: monitor
spec:accessModes:- ReadWriteOnce  # 访问模式(可选:ReadWriteOnce/ReadOnlyMany/ReadWriteMany)resources:requests:storage: 2Gi  # 请求的存储容量storageClassName: nfs-client  # 指定 StorageClass(根据集群环境调整)
---
apiVersion: apps/v1
kind: Deployment
metadata:name: grafana-servernamespace: monitor
spec:replicas: 1selector:matchLabels:task: monitoringk8s-app: grafanatemplate:metadata:labels:task: monitoringk8s-app: grafanaspec:containers:- name: grafanaimage: grafana/grafana:latestimagePullPolicy: IfNotPresentports:- containerPort: 3000protocol: TCPvolumeMounts:- mountPath: /var/lib/grafana/name: grafana-dataenv:- name: INFLUXDB_HOSTvalue: monitoring-influxdb- name: GF_SERVER_HTTP_PORTvalue: "3000"- name: GF_AUTH_BASIC_ENABLEDvalue: "false"- name: GF_AUTH_ANONYMOUS_ENABLEDvalue: "true"- name: GF_AUTH_ANONYMOUS_ORG_ROLEvalue: Admin- name: GF_SERVER_ROOT_URLvalue: /volumes:- name: grafana-datapersistentVolumeClaim:claimName: grafana-pvcaffinity:  # 调度优化(可选)nodeAffinity:preferredDuringSchedulingIgnoredDuringExecution:- weight: 1preference:matchExpressions:- key: node-role.kubernetes.io/monitoringoperator: Exists
---
apiVersion: v1
kind: Service
metadata:labels:kubernetes.io/cluster-service: 'true'kubernetes.io/name: monitoring-grafananame: grafana-svcnamespace: monitor
spec:ports:- port: 80targetPort: 3000nodePort: 31091selector:k8s-app: grafanatype: NodePort

6. 部署命令

按照以下顺序部署各个组件:

# 1. 创建命名空间
kubectl apply -f prometheus-namespace.yaml
​
# 2. 部署Prometheus配置
kubectl apply -f prometheus-configmap.yaml
​
# 3. 部署Prometheus主服务
kubectl apply -f prometheus.yaml
​
# 4. 部署Kube-state-metrics
kubectl apply -f kube-state-metrics.yaml
​
# 5. 部署Node Exporter
kubectl apply -f node-exportet-daemonset.yaml
kubectl apply -f node-exportet-svc.yaml
​
# 6. 部署Grafana
kubectl apply -f grafana.yaml

检查pod状态:

[root@master1 prometheus]# kubectl get pod -n monitor 
NAME                                 READY   STATUS    RESTARTS   AGE
grafana-server-64c9777c7b-drgdd      1/1     Running   0          110m
kube-state-metrics-6db447664-6r2wp   1/1     Running   0          110m
node-exporter-ccwk8                  1/1     Running   0          110m
node-exporter-fbq22                  1/1     Running   0          110m
node-exporter-hbtm6                  1/1     Running   0          110m
node-exporter-ndbhh                  1/1     Running   0          110m
node-exporter-sbb4p                  1/1     Running   0          110m
node-exporter-xd467                  1/1     Running   0          110m
prometheus-7cd9944dc4-lbjwx          1/1     Running   0          110m

7. 访问服务

部署完成后,可以通过以下方式访问服务:

  • Prometheus: http://<node-ip>:32224http://www.myprometheus.com(需要配置域名解析)

  • Grafana: http://<node-ip>:31091

前排提示:192.168.48.10是我的k8s集群高可用的vip,如果不是高可用,输入Pod所在的主机IP即可。

访问Prometheus:http://192.168.48.10:32224

访问grafana:http://192.168.48.10:31091/

8. grafana仪表盘展示

8.1 为grafana配置数据源

点击最下方save & test,出现Successfully queried the Prometheus API.则为成功。

8.2 导入仪表盘

仪表盘id:

  • node节点监控:16098

  • k8s集群监控:14249

8.3 仪表盘展示

http://www.lryc.cn/news/611999.html

相关文章:

  • 基于MATLAB实现的具有螺旋相位板的4F系统用于图像边缘增强的仿真
  • [科普] 从单核到千核:Linux SMP 的“演化史”与工程细节
  • 学习 Android (十六) 学习 OpenCV (一)
  • 【React 插件】@uiw/react-md-editor 使用教程:从基础使用到自定义扩展
  • 人工智能大数据模型驱动企业创新
  • AttributeError: ‘WSGIRequest‘ object has no attribute ‘data‘
  • LibTorch C++ 部署深度学习模型:GPU 调用配置详解
  • 关于C语言连续强制类型转换,有符号数据位移,以及温度传感器int16有符号数据重组处理问题
  • 数论手机辅助:打造便捷高效的移动应用交互体验
  • 房产开发证书识别-建筑工程施工许可证识别-竣工验收备案表识别-土地规划许可证识别-商品房预售许可证识别-建筑工程施工许可证识别等房产企业证书识别场景剖析
  • 【数据分享】西藏土壤类型数据库
  • 生成模型实战 | GPT-2(Generative Pretrained Transformer 2)详解与实现
  • 【Linux内核系列】:信号(上)
  • 力扣热题100------136.只出现一次的数字
  • JAVA高级编程第五章
  • QT----简单的htttp服务器与客户端
  • 主流linux版本分类与说明
  • 盲盒抽卡机小程序系统开发:打造个性化娱乐新平台
  • Web 端 AI 图像生成技术的应用与创新:虚拟背景与创意图像合成
  • Vite vs. vue-cli 创建 Vue 3 项目的区别与使用场景
  • [AI 生成] hive 面试题
  • 【debug】安装ComfyUI过程中的问题
  • C语言控制语句练习题2
  • 后端服务oom
  • Flutter桥接ArkTS技术指南(MethodChannel和BasicMessageChannel)
  • 高职5G移动网络运维实验(训)室解决方案
  • Cglib的Enhancer实现动态代理?
  • 网络资源模板--基于Android Studio 实现的拼图游戏App
  • Linux-Shell脚本流程控制
  • [linux] Linux系统中断机制详解及用户空间中断使用方法