当前位置: 首页 > news >正文

结合Tensuns管理prometheus的blackbox与告警设置

场景说明:


因为业务服务器已经完成了三级等保,禁止在业务服务器上部署任何应用,遂选择一台新的服务器部署prometheus,采用blackbox_exporter监控业务服务器的端口与域名状态。 

Tensuns项目介绍


https://github.com/starsliao/TenSunS

 

后羿 - TenSunS(原ConsulManager)是一个使用Flask+Vue开发,基于Consul的WEB运维平台,弥补了Consul官方UI对Services管理的不足;并且基于Consul的服务发现与键值存储:实现了Prometheus自动发现多云厂商各资源信息;基于Blackbox对站点监控的可视化维护;以及对自建与云上资源的优雅管理与展示。 

安装Tensuns

选择使用docker-compose安装

vim install_Tensuns.sh

#!/bin/bash
export PATH=$PATH:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
tsspath="/opt/tensuns"
uuid=`uuidgen`
adminpwd=`uuidgen|awk -F- '{print $1}'`
mkdir -p $tsspath/consul/config
cat <<EOF > $tsspath/consul/config/consul.hcl
log_level = "error"
data_dir = "/consul/data"
client_addr = "0.0.0.0"
ui_config{enabled = true
}
ports = {grpc = -1https = -1dns = -1grpc_tls = -1serf_wan = -1
}
peering {enabled = false
}
connect {enabled = false
}
server = true
bootstrap_expect=1
acl = {enabled = truedefault_policy = "deny"enable_token_persistence = truetokens {initial_management = "$uuid"agent = "$uuid"}
}
EOF
chmod 777 -R $tsspath/consul/config
cat <<EOF > $tsspath/docker-compose.yaml
version: '3.6'
services:consul:image: swr.cn-south-1.myhuaweicloud.com/starsl.cn/consul:latestcontainer_name: consulhostname: consulrestart: alwaysports:- "8500:8500"volumes:- $tsspath/consul/data:/consul/data- $tsspath/consul/config:/consul/config- /usr/share/zoneinfo/PRC:/etc/localtimecommand: "agent"networks:- TenSunSflask-consul:image: swr.cn-south-1.myhuaweicloud.com/starsl.cn/flask-consul:latestcontainer_name: flask-consulhostname: flask-consulrestart: alwaysvolumes:- /usr/share/zoneinfo/PRC:/etc/localtimeenvironment:consul_token: $uuidconsul_url: http://consul:8500/v1admin_passwd: $adminpwdlog_level: INFOdepends_on:- consulnetworks:- TenSunSnginx-consul:image: swr.cn-south-1.myhuaweicloud.com/starsl.cn/nginx-consul:latestcontainer_name: nginx-consulhostname: nginx-consulrestart: alwaysports:- "1026:1026"volumes:- /usr/share/zoneinfo/PRC:/etc/localtimedepends_on:- flask-consulnetworks:- TenSunSnetworks:TenSunS:name: TenSunSdriver: bridgeipam:driver: default
EOFecho -e "\n\033[31;1m正在启动后羿运维平台...\033[0m"
cd $tsspath && docker-compose up -d
echo -e "\n后羿运维平台默认的admin密码是:\033[31;1m$adminpwd\033[0m\n修改密码请编辑 $tsspath/docker-compose.yaml 查找并修改变量 admin_passwd 的值\n"
echo -e "请使用浏览器访问 http://{你的IP}:1026 并登录使用\n"
echo -e "\033[31;1mhttp://`ip route get 1.2.3.4 | awk '{print $NF}'|head -1`:1026\033[0m\n"

bash install_Tensuns.sh
#执行脚本前,先安装docker与docker-compose,安装后使用IP:1026登录


安装Alertmanager

安装包下载:Download | Prometheus
alertmanager-0.26.0.linux-amd64
设置启动脚本

vim /etc/systemd/system/alertmanager.service

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target[Service]
Type=simple
ExecStart=/opt/alertmanager-0.26.0.linux-amd64/alertmanager \--config.file=/opt/alertmanager-0.26.0.linux-amd64/alertmanager.yml \--storage.path=/opt/alertmanager-0.26.0.linux-amd64/data
#  --web.listen-address=:9081
#修改启动端口为9081
ExecReload=/bin/kill -HUP $MAINPID
Restart=always[Install]
WantedBy=multi-user.target

systemctl daemon-reload      
systemctl start alertmanager 
systemctl enable alertmanager
systemctl status alertmanager 

Alertmanager配置文件修改

vim alertmanager.yml

global:resolve_timeout: 5m   smtp_smarthost: 'smtp.qq.com:465'smtp_from: '******@qq.com'smtp_auth_username: '******@qq.com'
#SMTP授权码smtp_auth_password: 'kxtokczppbtabfbi'smtp_require_tls: false#邮件模板
templates:- '/opt/alertmanager-0.26.0.linux-amd64/alertsend.tmpl'
route:group_by: ['alertname']group_wait: 30sgroup_interval: 2mrepeat_interval: 10mreceiver: 'email'     receivers:
- name: 'email'email_configs:- to: '*****@qq.com' html: '{{ template "email.to.html" . }}'send_resolved: true inhibit_rules:  - source_match:severity: 'critical'target_match:severity: 'warning'
#触发severity为critical的告警时,抑制[ 'name', 'env','project'],都相等的warning告警equal: [ 'name', 'env','project']

配置邮件模板:

vim /opt/alertmanager-0.26.0.linux-amd64/alertsend.tmpl

{{ define "email.to.html" }}
{{ range .Alerts }}
告警程序: 域名IP端口检查告警 <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
故障项目: {{ .Labels.project }} <br>
故障环境: {{ .Labels.env }} <br>
告警详情: {{ .Annotations.description }} <br>
{{ end }}

Prometheus配置修改

vim /opt/prometheus-2.46.0.linux-amd64/prometheus.yml 
注意:token: '0eed6b85-6c5a-40a9-b02d-4de1eeea8319'的值与Tensuns上的一致。

# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- 127.0.0.1:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:- "rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]- job_name: 'blackbox_exporter'scrape_interval: 15sscrape_timeout: 5smetrics_path: /probeconsul_sd_configs:
#      - server: 'consul:8500'- server: '127.0.0.1:8500'token: '0eed6b85-6c5a-40a9-b02d-4de1eeea8319'services: ['blackbox_exporter']relabel_configs:- source_labels: ["__meta_consul_service_metadata_instance"]target_label: __param_target- source_labels: [__meta_consul_service_metadata_module]target_label: __param_module- source_labels: [__meta_consul_service_metadata_module]target_label: module- source_labels: ["__meta_consul_service_metadata_company"]target_label: company- source_labels: ["__meta_consul_service_metadata_env"]target_label: env- source_labels: ["__meta_consul_service_metadata_name"]target_label: name- source_labels: ["__meta_consul_service_metadata_project"]target_label: project- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: 127.0.0.1:9115

添加告警规则rules.yml

vim /opt/prometheus-2.46.0.linux-amd64/rules.yml

groups:- name: Domainrules:- alert: 站点可用性expr: probe_success{job="blackbox_exporter"} == 0for: 1mlabels:alertype: domainseverity: criticalannotations:description: "{{ $labels.env }}_{{ $labels.name }}({{ $labels.project }}):站点无法访问\n> {{ $labels.instance }}"- alert: 站点1h可用性低于80%expr: sum_over_time(probe_success{job="blackbox_exporter"}[1h])/count_over_time(probe_success{job="blackbox_exporter"}[1h]) * 100 < 80for: 3mlabels:alertype: domainseverity: warningannotations:description: "{{ $labels.env }}_{{ $labels.name }}({{ $labels.project }}):站点1h可用性:{{ $value | humanize }}%\n> {{ $labels.instance }}"- alert: 站点状态异常expr: (probe_success{job="blackbox_exporter"} == 0 and probe_http_status_code > 499) or probe_http_status_code == 0for: 1mlabels:alertype: domainseverity: warningannotations:description: "{{ $labels.env }}_{{ $labels.name }}({{ $labels.project }}):站点状态异常:{{ $value }}\n> {{ $labels.instance }}"- alert: 站点耗时过高expr: probe_duration_seconds > 0.5for: 2mlabels:alertype: domainseverity: warningannotations:description: "{{ $labels.env }}_{{ $labels.name }}({{ $labels.project }}):当前站点耗时:{{ $value | humanize }}s\n> {{ $labels.instance }}"- alert: SSL证书有效期expr: (probe_ssl_earliest_cert_expiry-time()) / 3600 / 24 < 15for: 2mlabels:alertype: domainseverity: warningannotations:description: "{{ $labels.env }}_{{ $labels.name }}({{ $labels.project }}):证书有效期剩余{{ $value | humanize }}天\n> {{ $labels.instance }}"

Tensuns结果验证:

登录Tensun添加需要被监控资源列表

登录Grafana配置数据源与dashboard

dashboard选择ID:9965

alertmanager查看被触发的告警信息

 登录邮箱查看告警邮件

http://www.lryc.cn/news/287041.html

相关文章:

  • printf实现
  • Elasticsearch 中的 term、terms 和 match 查询
  • 美易官方:开盘:美股高开科技股领涨 标普指数创盘中新高
  • STM32F407移植OpenHarmony笔记2
  • 数据仓库-相关概念
  • 线程的面试八股
  • Jmeter 配置元件
  • Java- @FunctionalInterface声明一个接口为函数式接口
  • Java使用Netty实现端口转发Http代理Sock5代理服务器
  • Linux环境docker安装Neo4j,以及Neo4j新手入门教学(超详细版本)
  • C++ inline 关键字有什么做用?
  • eNSP学习——理解ARP及Proxy ARP
  • Unity中UGUI在Mask剪裁粒子特效的实现
  • 精通 VS 调试技巧,学习与工作效率翻倍!
  • yarn 安装包时报“certificate has expired”
  • Qt5项目拆解第一集解决:中文乱码| 全局字体|注册表|QSS/CSS
  • 消息队列RabbitMQ.01.安装部署与基本使用
  • 1.24号c++
  • 【GitHub项目推荐--12 年历史的 PDF 工具开源了】【转载】
  • React16源码: React中的PortalComponent创建, 调和, 更新的源码实现
  • Hive-SQL语法大全
  • 编译原理2.3习题 语法制导分析[C++]
  • JUC-CAS
  • Effective C++——关于重载赋值运算
  • vscode debug
  • 数据库选型其实技术维度不太重要
  • 【C++】入门(二)
  • Nginx 代理服务路径带/和不带/的问题
  • C# CefSharp 输入内容,点击按钮,并且滑动。
  • 历经15年,比特币以强势姿态进军华尔街!270亿美元投资狂潮引发市场震荡!