当前位置: 首页 > news >正文

一键部署 Prometheus + Grafana + Alertmanager 教程(使用 Docker Compose)

1. 安装前准备

确保你已安装以下组件:

# 安装 Docker
curl -fsSL https://get.docker.com | bash# 安装 Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" \-o /usr/local/bin/docker-composesudo chmod +x /usr/local/bin/docker-compose
docker-compose --version

2. 创建项目目录结构

mkdir -p ~/prometheus-stack
cd ~/prometheus-stackmkdir -p prometheus
mkdir -p alertmanager

3. 创建配置文件

nano docker-compose.yml

粘贴以下内容:

version: '3.3'services:prometheus:image: prom/prometheuscontainer_name: prometheusports:- "9090:9090"volumes:- ./prometheus.yml:/etc/prometheus/prometheus.yml- ./prometheus:/etc/prometheus/rulesrestart: unless-stoppedgrafana:image: grafana/grafana-osscontainer_name: grafanaports:- "3001:3000"environment:- GF_SECURITY_ADMIN_USER=admin- GF_SECURITY_ADMIN_PASSWORD=adminvolumes:- grafana-storage:/var/lib/grafanarestart: unless-stoppedalertmanager:image: prom/alertmanagercontainer_name: alertmanagerports:- "9093:9093"volumes:- ./alertmanager.yml:/etc/alertmanager/alertmanager.ymlcommand:- '--config.file=/etc/alertmanager/alertmanager.yml'restart: unless-stoppedvolumes:grafana-storage:
nano prometheus.yml

内容如下

# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- "xxxxx:9093"# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:- "/etc/prometheus/rules/*.yml"# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]# The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.labels:app: "prometheus"- job_name: "agent_windows"static_configs:- targets: ["xxxx:9182"]labels:app: "windows"instance: "虚拟机"- job_name: "alertmanager"static_configs:- targets: ["localhost:9093"]labels:app: "alertmanager"- job_name: "node_exporter"static_configs:- targets: ["xxxx:9100"]labels:app: "node_exporter"instance: "阿里云测试服务器"- job_name: "process_exporter"static_configs:- targets: ["xxxx:9256"]labels:app: "process"instance: "阿里云测试服务器"- job_name: 'http_probe'metrics_path: /probeparams:module: [http_2xx]static_configs:- targets:- https:XXXXXXlabels:instance: "公司主页"app: webapp- targets:- http://xxxxxxxx:9997/docs#/labels:instance: "PDF合并"app: pdfservicerelabel_configs:- source_labels: [__address__]target_label: __param_target#- source_labels: [__param_target]#  target_label: url   # 保留 static_configs 里的 instance 标签,不覆盖# 也可以注释掉下一行,避免覆盖instance标签# - source_labels: [__param_target]#   target_label: instance- target_label: __address__replacement: xxxxxxx:9115
nano alertmanager.yml
route:group_by: ['alertname'] # 按 alertname 标签分组告警(相同告警合并通知)group_wait: 30s # 第一次告警延迟30秒再发送,防止太快触发group_interval: 5m  # 同一组告警发送间隔至少5分钟(防止频繁通知)repeat_interval: 1h # 告警持续存在,重复通知间隔1小时receiver: 'webhook_receiver'  # 默认发送接收器名称receivers:- name: 'webhook_receiver'webhook_configs:- url: 'http://xxxx:5012/alertmanager_to_feishu'send_resolved: true inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']

剩下的配置文件就可以添加到prometheus/文件夹下

4. 启动服务

docker-compose up -d

http://www.lryc.cn/news/589297.html

相关文章:

  • Linux-【单体架构/分布式架构】
  • 10+热门 AI Agent 框架深度解析:谁更适合你的项目?
  • Mysql中存储引擎、索引、sql调优、锁、innodb引擎架构、MVCC多版本并发控制总结
  • Linux操作系统从入门到实战(十)Linux开发工具(下)make/Makefile的推导过程与扩展语法
  • next.js 登录认证:使用 github 账号授权登录。
  • 开发者工具在爬虫开发中的应用与面板功能详解
  • 【Keil】C/C++混合编程的简单方法
  • A*算法详解
  • 如何进行 Docker 数据目录迁移
  • 【C++】初识C++(1)
  • UDP和TCP的主要区别是什么
  • ADC采集、缓存
  • Kafka——生产者消息分区机制原理剖析
  • Kafka亿级消息资源组流量掉零故障排查
  • Eplan API SQL
  • 记录一条sql面试题2
  • Kafka 4.0 技术深度解析
  • 4G模块 A7670G打电话并且播报TTS语音
  • 2025-7-15-C++ 学习 排序(4)
  • 项目进度与预算脱节,如何进行同步管理
  • Flex 布局精讲
  • labview生成exe应用程序常见问题
  • RocketMq 启动_源码分析
  • 程序“夯住“的常见原因
  • 高并发四种IO模型的底层原理
  • linux的磁盘满了清理办法
  • Java 大视界 -- Java 大数据机器学习模型在金融风险传染路径分析与防控策略制定中的应用(347)
  • gitee某个分支合并到gitlab目标分支
  • 3D数据:从数据采集到数据表示,再到数据应用
  • pc浏览器页面语音播报功能