十七、K8s 可观测性:全链路追踪
十七、K8s 可观测性:全链路追踪
文章目录
- 十七、K8s 可观测性:全链路追踪
- 1、Skywalking 初识
- 1.1 为什么需要全链路追踪平台
- 1.2 全链路追踪核心组件及工作原理
- 1.2.1 全链路追踪核心概念
- 1.2.2 全链路追踪工作原理
- 1.3 什么是Skywalking?
- 1.4 Skywalking架构解析
- 1.5 Skywalking核心术语和名词
- 2、Skywalking 集群安装
- 2.1 集群规划
- 2.2 Skywalking 集群安装
- 2.3 Java 服务接入 Skywalking
- 2.4 Go 服务接入 Skywalking
- 2.5 清理环境
- 3、全链路追踪项目练习
- 3.1 服务部署
- 3.1.1 部署数据库(延用上个实验配置)
- 3.1.2 启动 order 服务
- 3.1.3 部署 handler 服务(延用上个实验配置)
- 3.1.4 部署 receive 服务
- 3.1.5 部署前端服务
- 3.2 服务访问与监控
- 3.3 模拟故障
- 4、Skywalking 告警
- 4.1 Skywalking 告警通知
- 4.2 Skywalking 告警规则
- 4.3 钉钉告警机器人配置
- 4.4 Skywalking 接入钉钉告警
- 4.5 自定义告警规则
1、Skywalking 初识
1.1 为什么需要全链路追踪平台
- 快速定位故障点
- 快速定位性能依赖关系
- 理解服务依赖关系
- 全局流量可视化
1.2 全链路追踪核心组件及工作原理
1.2.1 全链路追踪核心概念
- Trace:一个请求的完整操作过程被称作一个Trace,代表从客户端发起请求到后端完全处理到整个过程,一个trace由多个span组成。
- Span:一个Span表示Trace中的一部分工作,可以理解为一次函数调用或者是一个HTTP请求。每个Span都包含了操作名称、开始时间、结束时间以及操作相关的元数据等信息。Span具有上下级关系(父子关系),同时多个Span的结合就表达了一次Trace。
- Trace ID 和 Span ID:每个Trace都有一个唯一的 Trace ID,每一个Span都有一个唯一的 Span ID,并且还包含了指向父级Span的引用。
1.2.2 全链路追踪工作原理
1、客户端发起请求
2、服务A开始处理请求并创建初始Trace和Span
3、服务A将请求转发给服务B,同时传递 race ID 和 Span ID
4、服务B根据传递的信息继续创建新的Span,并标记父Span
5、所有服务处理完成后,各自产生的Span数据都会发送至追踪平台进行汇总
6、用户可以通过UI查看整个Trace的详细信息
1.3 什么是Skywalking?
Skywalking是一个针对分布式系统的应用性能监控(Application Performance Monitor, APM)和可观测性分析平台(Observability Analysis Platform)。Skywalking提供了包括分布式追踪、指标监控、故障诊断信息、服务网格遥测分析、异常告警以及可视化界面等功能,可帮助开发人员和运维团队更好地理解和管理应用和服务。
核心特性:
- 分布式追踪:Skywalking可以为请求生成跟踪数据,能够帮助用户了解整个调用链路的情况,从而定位性能瓶颈或问题根源
- 度量分析:支持对服务的健康状况进行度量分析,如响应时间、吞吐量、成功率等关键性能指标(KPI)
- 告警机制:支持自定义规则告警,当检测到异常情况时自动发送告警通知
- 丰富的UI界面:提供了直观易用的Web UI,方便用户查看追踪数据、监控指标及服务拓扑结构等
- 低侵入性:通过字节码注入的方式实现代码级别的监控,无需修改业务逻辑即可完成接入
- 多语言支持:除了Java之外,还支持.NET Core、Node.js、Python、Go等多种编程语言,满足不同开发环境的需求
- 多平台集成:支持与服务网格、Kubernetes集成
1.4 Skywalking架构解析
1.5 Skywalking核心术语和名词
- Service:Service指的是一个或一组提供相同功能或业务逻辑的应用。可以是一个微服务、一个web服务、一个数据库或者其他类型的后端服务
- Instance:Instance是指服务的一个具体运行实例。在一个分布式环境种,同一个服务可能部署在多个不同的服务器或者容器上,每个容器或服务器上的这个服务就是一个Instance
- Endpoint:Endpoint是指服务中可被外部访问的具体路径或接口,端点是服务对外暴露功能的入口点
2、Skywalking 集群安装
2.1 集群规划
主机名称 | 物理IP | 系统 | 资源配置 | 说明 |
---|---|---|---|---|
k8s-master01 | 192.168.200.50 | Rocky9.4 | 4核8g | Master节点 |
k8s-node01 | 192.168.200.51 | Rocky9.4 | 4核8g | Node01节点 |
k8s-node02 | 192.168.200.52 | Rocky9.4 | 4核8g | Node02节点 |
2.2 Skywalking 集群安装
# 添加 Skywalking Helm 源
[root@k8s-master01 ~]# export REPO=skywalking
[root@k8s-master01 ~]# helm repo add ${REPO} https://apache.jfrog.io/artifactory/skywalking-helm# 下载skywalking
[root@k8s-master01 ~]# helm pull skywalking/skywalking# 解压安装包:
[root@k8s-master01 ~]# tar xf skywalking-4.3.0.tgz
[root@k8s-master01 ~]# cd skywalking
[root@k8s-master01 skywalking]# vim values.yaml
[root@k8s-master01 skywalking]# cat values.yaml
# 更改 Elasticsearch 配置:
elasticsearch:antiAffinity: softclusterHealthCheckParams: wait_for_status=green&timeout=10sclusterName: es-clusterconfig:host: elasticsearchpassword: adminport:http: 9200user: adminenabled: trueesMajorVersion: "7"image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/elasticsearchimagePullPolicy: IfNotPresentimageTag: 7.5.1persistence:annotations: {}enabled: truereplicas: 3resources:limits:cpu: 2000mmemory: 3Girequests:cpu: 1000mmemory: 2GivolumeClaimTemplate:storageClassName: nfs-csiaccessModes:- ReadWriteOnceresources:requests:storage: 30Gi
initContainer:image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/busyboxtag: "1.30"# 更改 OAP 的资源配置:
oap:image:pullPolicy: IfNotPresentrepository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-oap-servertag: 10.2.0javaOpts: -Xmx2g -Xms2greplicas: 3resources: limits:cpu: 2000mmemory: 3Girequests:cpu: 1000mmemory: 2GistorageType: elasticsearch# 更改 UI 配置:
ui:image:pullPolicy: IfNotPresentrepository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-uitag: 10.2.0replicas: 3service:annotations: {}externalPort: 80internalPort: 8080type: NodePort
[root@k8s-master01 skywalking]# vim templates/oap-deployment.yaml
[root@k8s-master01 skywalking]# sed -n "91,100p" templates/oap-deployment.yaml livenessProbe:tcpSocket:port: 12800initialDelaySeconds: 300periodSeconds: 20readinessProbe:tcpSocket:port: 12800initialDelaySeconds: 300periodSeconds: 20
# 删除冲突资源
[root@k8s-master01 skywalking]# rm -rf charts/elasticsearch/templates/pod*# 安装:
[root@k8s-master01 skywalking]# helm install skywalking -n skywalking . --create-namespace# 查看安装状态:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking
NAME READY STATUS RESTARTS AGE
es-cluster-master-0 1/1 Running 0 13m
es-cluster-master-1 1/1 Running 0 13m
es-cluster-master-2 1/1 Running 0 13m
skywalking-es-init-mkvw7 1/1 Running 0 13m
skywalking-oap-6d8f594b7c-7w785 1/1 Running 0 13m
skywalking-oap-6d8f594b7c-p4z64 1/1 Running 0 13m
skywalking-oap-6d8f594b7c-vnp8t 1/1 Running 0 13m
skywalking-ui-774674cc7-qcm79 1/1 Running 0 13m
skywalking-ui-774674cc7-qhgg8 1/1 Running 0 13m
skywalking-ui-774674cc7-qwkjm 1/1 Running 0 13m# 查看service
[root@k8s-master01 skywalking]# kubectl get svc skywalking-ui -n skywalking
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
skywalking-ui NodePort 10.108.110.98 <none> 80:31319/TCP 14m
访问 skywalking-ui
2.3 Java 服务接入 Skywalking
Java Agent 参考文档:
Java 语言:
- JAVA_TOOL_OPTIONS:指定 JAVA 的启动参数,加载 agent 可以通过该变量实现,比如-javaagent:/skywalking/agent/skywalking-agent.jar
- SW_AGENT_NAME:服务名称,建议格式<组名>::<逻辑名>,推荐配置为命令空
间::服务名称- SW_AGENT_INSTANCE_NAME:实例名称,通常用于表示同一个服务不同的示
例,默认为 UUID@hostname,推荐使用 Pod 名称作为实例名称- SW_AGENT_COLLECTOR_BACKEND_SERVICES:Skywalking OAP 地址
[root@k8s-master01 skywalking]# mkdir demo/
[root@k8s-master01 skywalking]# cd demo/
[root@k8s-master01 demo]# vim demo-handler-deploy-sw.yaml
[root@k8s-master01 demoskywalking]# cat demo-handler-deploy-sw.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-handlername: demo-handlernamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-handlerstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-handlerspec:volumes: # 添加 Volumes 及初始化容器- name: skywalking-agentemptyDir: {}initContainers:- name: agent-containerimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8volumeMounts:- name: skywalking-agentmountPath: /agentcommand: [ "/bin/sh" ]args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]containers:- env:- name: SPRING_PROFILES_ACTIVEvalue: k8supgrade- name: SERVER_PORTvalue: "8080"- name: JAVA_TOOL_OPTIONS # 添加环境变量value: "-javaagent:/skywalking/agent/skywalking-agent.jar"- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: APPvalueFrom:fieldRef:fieldPath: metadata.labels['app']- name: SW_AGENT_NAMEvalue: "$(NAMESPACE)::$(APP)"- name: SW_AGENT_INSTANCE_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: SW_AGENT_COLLECTOR_BACKEND_SERVICESvalue: skywalking-oap.skywalking:11800image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-handler:v1-upgradeimagePullPolicy: IfNotPresentvolumeMounts: # 添加挂载- name: skywalking-agentmountPath: /skywalkinglivenessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2name: demo-handlerreadinessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30
# 接下来创建服务并测试:
[root@k8s-master01 demoskywalking]# kubectl create namespace demo
[root@k8s-master01 demoskywalking]# kubectl create -f demo-handler-deploy-sw.yaml -n demo# 检查pod情况
[root@k8s-master01 demoskywalking]# kubectl get po -n demo -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-handler-5b6f9dd9c7-88pr649d6fd88f-kxhqb 1/1 Running 0 77s 1792.16.58.233 k8s-node028.32.140 k8s-master01 <none> <none># 访问测试(可以多测试几次)
[root@k8s-master01 demoskywalking]# curl 1792.16.58.2338.32.140:8080/api/generate
O4E,\1L!u-bzTE[7Fn#VCS+eK?fwcp|k
查看 skywalking 图表:
拓扑图
2.4 Go 服务接入 Skywalking
Go Agent 参考文档:
Go 语言:
- SW_AGENT_REPORTER_GRPC_BACKEND_SERVICE:Skywalking OAP 地址
- SW_AGENT_NAME:服务名称,建议格式<组名>::<逻辑名>,推荐配置为命令空
间::服务名称- SW_AGENT_INSTANCE_NAME:实例名称,通常用于表示同一个服务不同的示例,默认为 UUID@hostname,推荐使用 Pod 名称作为实例名称
# 下载测试程序:
[root@habor ~]# git clone https://gitee.com/dukuan/demo-order.git# 编写dockerfile文件
[root@habor ~]# cd demo-order-master
[root@habor demo-order-master]# vim Dockerfile
[root@habor demo-order-master]# cat Dockerfile
FROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-go:0.5.0-go1.22 AS builder
COPY ./ /go/src/
WORKDIR /go/src/RUN export GO111MODULE=on && \export GOPROXY=https://goproxy.cn,direct && \skywalking-go-agent -inject /go/src && \go build -o ./order -toolexec="skywalking-go-agent" -a /go/srcFROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/alpine:3.20
COPY --from=builder /go/src/order .
CMD [ "./order" ]# 制作镜像
[root@habor demo-order-master]# docker build -t crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1 .# 推送镜像到镜像仓库
[root@habor demo-order-master]# docker push crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1
[root@k8s-master01 demo]# vim mysql.yaml
[root@k8s-master01 demo]# cat mysql.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: mysqlname: mysqlnamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: mysqlstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: mysqlspec:volumes:- name: datapersistentVolumeClaim:claimName: mysql-datacontainers:- env:- name: MYSQL_ROOT_PASSWORDvalue: passwordimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/mysql:8.0.20imagePullPolicy: IfNotPresentname: mysqlresources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilevolumeMounts:- name: datamountPath: /var/lib/mysqldnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30[root@k8s-master01 demo]# vim mysql-svc.yaml
[root@k8s-master01 demo]# cat mysql-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: mysqlname: mysqlnamespace: demo
spec:ports:- nodePort: 32541port: 3306protocol: TCPtargetPort: 3306selector:app: mysqlsessionAffinity: Nonetype: NodePort[root@k8s-master01 demo]# vim mysql-pvc.yaml
[root@k8s-master01 demo]# cat mysql-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:name: mysql-datanamespace: demo
spec:resources:requests:storage: 5GivolumeMode: FilesystemstorageClassName: nfs-csiaccessModes:- ReadWriteOnce# 创建基础组件服务:
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f mysql-pvc.yaml -n demo# 查看pod
[root@k8s-master01 demo]# kubectl get po -n demo
NAME READY STATUS RESTARTS AGE
....
mysql-6d698b4676-8hsn8 1/1 Running 0 3m22s# 配置数据库:
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-8hsn8 -n demo -- bash
root@mysql-6d698b4676-8hsn8:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> create database orders;
Query OK, 1 row affected (0.01 sec)mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.01 sec)mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.02 sec)
# 由于 Go 的代码在编译时已经插入探针,所以在启动时,无法特别指定配置,只需要保留相关的环境变量即可:
[root@k8s-master01 demo]# vim demo-order-deploy.yaml
[root@k8s-master01 demo]# cat demo-order-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-ordername: demo-ordernamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-orderstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-orderspec:containers:- env:- name: MYSQL_HOSTvalue: mysql- name: MYSQL_PORTvalue: "3306"- name: MYSQL_USERvalue: order- name: MYSQL_PASSWORDvalue: password- name: MYSQL_DBvalue: orders# 添加变量- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: APPvalueFrom:fieldRef:fieldPath: metadata.labels['app']- name: SW_AGENT_NAMEvalue: "$(NAMESPACE)::$(APP)"#- name: SW_AGENT_NAME# value: demo::demo-order- name: SW_AGENT_INSTANCE_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: SW_AGENT_REPORTER_GRPC_BACKEND_SERVICEvalue: skywalking-oap.skywalking:11800image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v2imagePullPolicy: AlwayslivenessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2name: demo-orderreadinessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30
# 接下来创建服务并测试:
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -n demo# 检查pod情况
[root@k8s-master01 demo]# kubectl get po -n demo -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-order-755cdc96-ltlzg 1/1 Running 0 65s 172.16.58.239 k8s-node02 <none> <none># 访问测试(可以多测试几次)
[root@k8s-master01 demo]# curl 172.16.58.239:8080/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20}]
查看 skywalking 图表:
自动检测数据库
2.5 清理环境
[root@k8s-master01 demo]# kubectl delete deploy -n demo --all
3、全链路追踪项目练习
通过上述的学习,Skywalking 已经成功接入 Go 和 Java 的链路数据,接下来通过一个完整的项目,继续巩固 Skywalking 的学习。
项目架构:
3.1 服务部署
3.1.1 部署数据库(延用上个实验配置)
# 部署数据库
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f
[root@k8s-master01 demo]# kubectl get po -n demo
NAME READY STATUS RESTARTS AGE
mysql-6d698b4676-sk8hj 1/1 Running 0 17s# 创建账号
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-sk8hj -n demo -- bash
root@mysql-6d698b4676-sk8hj:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> create database orders;
Query OK, 1 row affected (0.04 sec)mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.02 sec)mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.01 sec)
3.1.2 启动 order 服务
# 启动 order 服务,order 服务为 Go 程序,无需更改额外的配置即可完成监控数据的推送:
# 延用上个实验配置,创建一个service
[root@k8s-master01 demo]# vim demo-order-svc.yaml
[root@k8s-master01 demo]# cat demo-order-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: ordername: ordernamespace: demo
spec:ports:- name: http-webport: 80protocol: TCPtargetPort: 8080selector:app: demo-ordersessionAffinity: Nonetype: ClusterIP# 配置一个对外的域名
[root@k8s-master01 demo]# vim demo-order-ingress.yaml
[root@k8s-master01 demo]# cat demo-order-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: demo-ordernamespace: demo
spec:ingressClassName: nginxrules:- host: demo.test.comhttp:paths:- backend:service:name: orderport:number: 80path: /orderspathType: ImplementationSpecific# 创建服务
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -f demo-order-svc.yaml -f demo-order-ingress.yaml -n demo# 查看服务状态:
[root@k8s-master01 demo]# kubectl get pod -n demo -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-order-755cdc96-8qlc9 1/1 Running 0 2m54s 172.16.58.245 k8s-node02 <none> <none>
mysql-6d698b4676-sk8hj 1/1 Running 0 111m 172.16.58.241 k8s-node02 <none> <none>[root@k8s-master01 demo]# kubectl get svc,ingress -n demo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/mysql NodePort 10.111.54.12 <none> 3306:32541/TCP 111m
service/order ClusterIP 10.101.166.166 <none> 80/TCP 3m1sNAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/demo-order nginx demo.test.com 192.168.200.52 80 3m1s# 测试访问:
[root@k8s-master01 demo]# echo "192.168.200.52 demo.test.com" >> /etc/hosts
[root@k8s-master01 demo]# curl demo.test.com/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20},{"id":3,"name":"Order 1","price":10},{"id":4,"name":"Order 2","price":20}]
3.1.3 部署 handler 服务(延用上个实验配置)
# 部署 handler 服务
[root@k8s-master01 demo]# kubectl create -f demo-handler-deploy-sw.yaml -f demo-handler-svc.yaml -n demo
3.1.4 部署 receive 服务
[root@k8s-master01 demo]# vim demo-receive-deploy.yaml
[root@k8s-master01 demo]# cat demo-receive-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-receivename: demo-receivenamespace: demo
spec:progressDeadlineSeconds: 600replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-receivestrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-receivespec:volumes:- name: skywalking-agentemptyDir: {}initContainers:- name: agent-containerimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8volumeMounts:- name: skywalking-agentmountPath: /agentcommand: [ "/bin/sh" ]args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]containers:- env:- name: SPRING_PROFILES_ACTIVEvalue: k8supgrade- name: SERVER_PORTvalue: "8080"- name: JAVA_TOOL_OPTIONSvalue: "-javaagent:/skywalking/agent/skywalking-agent.jar"- name: NAMESPACEvalueFrom:fieldRef:fieldPath: metadata.namespace- name: APPvalueFrom:fieldRef:fieldPath: metadata.labels['app']- name: SW_AGENT_NAMEvalue: "$(NAMESPACE)::$(APP)"- name: SW_AGENT_INSTANCE_NAMEvalueFrom:fieldRef:fieldPath: metadata.name- name: SW_AGENT_COLLECTOR_BACKEND_SERVICESvalue: skywalking-oap.skywalking:11800volumeMounts:- name: skywalking-agentmountPath: /skywalkingimage: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-receive:v1-upgradeimagePullPolicy: AlwayslivenessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2name: demo-receivereadinessProbe:failureThreshold: 2initialDelaySeconds: 30periodSeconds: 5successThreshold: 1tcpSocket:port: 8080timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30[root@k8s-master01 demo]# vim demo-receive-svc.yaml
[root@k8s-master01 demo]# cat demo-receive-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: demo-receivename: demo-receivenamespace: demo
spec:ports:- name: http-webport: 8080protocol: TCPtargetPort: 8080selector:app: demo-receivesessionAffinity: Nonetype: ClusterIP[root@k8s-master01 demo]# vim demo-receive-ingress.yaml
[root@k8s-master01 demo]# cat demo-receive-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:annotations:nginx.ingress.kubernetes.io/rewrite-target: /$2name: demo-receivenamespace: demo
spec:ingressClassName: nginxrules:- host: demo.test.comhttp:paths:- backend:service:name: demo-receiveport:number: 8080path: /receiveapi(/|$)(.*)pathType: ImplementationSpecific# 部署 receive 服务:
[root@k8s-master01 demo]# kubectl create -f demo-receive-deploy.yaml -f demo-receive-svc.yaml -f demo-receive-ingress.yaml -n demo
3.1.5 部署前端服务
[root@k8s-master01 demo]# vim demo-ui-deploy.yaml
[root@k8s-master01 demo]# cat demo-ui-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: demo-uiname: demo-uinamespace: demo
spec:replicas: 1revisionHistoryLimit: 10selector:matchLabels:app: demo-uistrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: demo-uispec:containers:- image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-ui:swimagePullPolicy: AlwayslivenessProbe:failureThreshold: 2initialDelaySeconds: 10periodSeconds: 5successThreshold: 1tcpSocket:port: 80timeoutSeconds: 2name: demo-uireadinessProbe:failureThreshold: 2initialDelaySeconds: 10periodSeconds: 5successThreshold: 1tcpSocket:port: 80timeoutSeconds: 2resources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30[root@k8s-master01 demo]# vim demo-ui-svc.yaml
[root@k8s-master01 demo]# cat demo-ui-svc.yaml
apiVersion: v1
kind: Service
metadata:labels:app: demo-uiname: demo-uinamespace: demo
spec:ports:- name: http-webport: 80protocol: TCPtargetPort: 80selector:app: demo-uisessionAffinity: Nonetype: ClusterIP[root@k8s-master01 demo]# vim demo-ui-ingress.yaml
[root@k8s-master01 demo]# cat demo-ui-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: demo-uinamespace: demo
spec:ingressClassName: nginxrules:- host: demo.test.comhttp:paths:- backend:service:name: demo-uiport:number: 80path: /pathType: ImplementationSpecific# 部署前端服务:
[root@k8s-master01 demo]# kubectl create -f demo-ui-deploy.yaml -f demo-ui-svc.yaml -f demo-ui-ingress.yaml -nn demo# 部署完毕后,最终的服务如下:
[root@k8s-master01 demo]# kubectl get po,svc,ingress -n demo
NAME READY STATUS RESTARTS AGE
pod/demo-handler-5b6f9dd9c7-g4k5s 1/1 Running 1 (25m ago) 26m
pod/demo-order-755cdc96-8qlc9 1/1 Running 0 47m
pod/demo-receive-5cf555cdfd-j5g76 1/1 Running 1 (14m ago) 16m
pod/demo-ui-66bb5f4d67-smbpb 1/1 Running 0 83s
pod/mysql-6d698b4676-sk8hj 1/1 Running 0 155mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/demo-receive ClusterIP 10.103.251.213 <none> 8080/TCP 16m
service/demo-ui ClusterIP 10.106.49.125 <none> 80/TCP 83s
service/handler ClusterIP 10.102.43.148 <none> 80/TCP 26m
service/mysql NodePort 10.111.54.12 <none> 3306:32541/TCP 155m
service/order ClusterIP 10.101.166.166 <none> 80/TCP 47mNAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/demo-order nginx demo.test.com 192.168.200.52 80 47m
ingress.networking.k8s.io/demo-receive nginx demo.test.com 192.168.200.52 80 16m
ingress.networking.k8s.io/demo-ui nginx demo.test.com 192.168.200.52 80 83s
接下来通过浏览器访问:
3.2 服务访问与监控
接下来访问页面,测试生成密码和创建订单:
之后就可以看到整个项目的架构图:
创建订单会有随机延迟,延迟信息也可以在 skywalking 上面看到 trace 信息:
3.3 模拟故障
# 接下来模拟 handler 服务故障:
[root@k8s-master01 demo]# kubectl scale deploy demo-handler mysql --replicas=0 -n demo
再次访问即可收集到错误的链路信息:
4、Skywalking 告警
4.1 Skywalking 告警通知
Skywalking支持针对采集的Metrics数据进行监控告警,并可以在出现异常时及时作出反应。通过合理配置告警规则和钩子,可以实现有效地预防潜在问题并及时定位相关问题。
Skywalking的告警核心由一组规则实现,主要包含如下三个部分:
- 指标(Metrics):Skywalking收集的关于服务、实例和端点的各种性能指标数据
- 规则(Rules):告警的触发规则,默认定义在
config/alarm-settings.yaml
文件中,支持比较运算符和逻辑运算符等- 钩子(Hooks):当告警被触发后,通过钩子来执行特定的操作,如发送通知等
4.2 Skywalking 告警规则
Skywalking 告警规则由如下元素组成:
- 规则名称:全局唯一,必须由
_rule
结尾- expression:使用MOE(Metrics Query Expression)定义,表达式的结果必须是
SINGLE_VALUE
,且根操作必须是一个比较操作或布尔操作,同时结果需要为1(true)或0(false),当结果为1(true)时,告警会被触发- include-name:包含的实体名称,可以是Service、Instance、Endpoint等,列表类型
- exclude-names:排除的实体名称
- include-names-regex:正则匹配包含
- exclude-names-regex:正则匹配排除
- tags:附加告警标签,比如
level=warning
- period:周期,检查告警条件的时间窗口大小,以分钟为单位
- silence-period:静默期,某个告警被触发后,在接下来的一段时间内,该告警不会再次被触发,不指定该值则和
period
一样- hooks:告警触发时绑定的钩子名称,名称格式为
{hookType}.{hookName}
(例如slack.customl
),并且必须在alarm-settings.yml
文件的hooks
部分定义。如果未指定钩子名称,则会使用全局钩子- message:告警信息,可以用作描述当前告警
4.3 钉钉告警机器人配置
使用钉钉告警,需要先创建一个群聊,然后添加一个机器人:
添加机器人
选择自定义
填写机器人名称,以及复制密匙
添加机器人以及复制Webhook
4.4 Skywalking 接入钉钉告警
首先把 Skywalking 告警的配置文件放置在 Skywalking 的安装目录:
# 创建告警存放目录
[root@k8s-master01 demo]# mkdir -p ../files/conf.d/oap
[root@k8s-master01 demo]# cd ../files/conf.d/oap# 从oap容器里把告警模板文件copy出来
[root@k8s-master01 oap]# kubectl cp skywalking-oap-6d8f594b7c-xrnbr:/skywalking/config/alarm-settings.yml ./alarm-settings.yml -n skywalking# 添加钉钉告警
[root@k8s-master01 oap]# vim alarm-settings.yml
[root@k8s-master01 oap]# tail -14 alarm-settings.yml
hooks:dingtalk:default:is-default: truetext-template: |-{"msgtype": "text","text": {"content": "Apache SkyWalking Alarm: \n %s."} }webhooks:- url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017# 更新配置(需要回到skywalking根目录)
[root@k8s-master01 oap]# cd ../../..
[root@k8s-master01 skywalking]# helm upgrade skywalking . -n skywalking# 查看 Pod 更新状态:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking | grep oap
skywalking-oap-5644bbbd46-hvvxx 1/1 Running 0 11m# 查看配置文件是否更新:
[root@k8s-master01 skywalking]# kubectl exec skywalking-oap-5644bbbd46-hvvxx -n skywalking -- tail -14 config/alarm-settings.yml
Defaulted container "oap" out of: oap, wait-for-elasticsearch (init)
hooks:dingtalk:default:is-default: truetext-template: |-{"msgtype": "text","text": {"content": "Apache SkyWalking Alarm: \n %s."} }webhooks:- url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017
请求服务,触发告警:
等待一会钉钉即可查询到告警信息
4.5 自定义告警规则
除了默认告警,还可以添加一些自定义告警,比如想要监控 Java 服务 JVM 线程池是否阻塞,可以通过 instance_jvm_thread_blocked_state_thread_count
指标进行监控。
# 比如监控 JVM 阻塞的线程数大于 5:
[root@k8s-master01 oap]# vim alarm-settings.yml
[root@k8s-master01 oap]# cat alarm-settings.yml
....
rules:thread_block_rule:expression: sum(instance_jvm_thread_blocked_state_thread_count >5) >= 2period: 5 # 检查过去 5 分钟的数据message: "服务 {name} 的线程池,在过去两分钟内被阻塞的数量超过 5"
....# 更改配置文件后,更新配置:
[root@k8s-master01 skywalking]# helm upgrade skywalking -n skywalking .
[root@k8s-master01 skywalking]# kubectl rollout restart deploy skywalking-oap -n skywalking
此博客来源于:https://edu.51cto.com/lecturer/11062970.html