ELK部署与使用详解
1 部署 ELK Stack 监控 Java 服务
下面是一个完整的 ELK(Elasticsearch, Logstash, Kibana)部署方案,用于监控和分析您的 Java 服务(plugin-manager)日志:
1.1 完整的 docker-compose.yml
version: '3.8'services:# 原有 Java 服务plugin-manager:image: openjdk:8-jre-alpinecontainer_name: plugin-managercommand: ["java", "-jar", "/app/jars/plugin-manager-1.0.0.jar"]volumes:- ./jars:/app/jarsports:- "8080:8080"environment:TZ: Asia/ShanghaiJAVA_OPTS: "-Xmx512m -Dfile.encoding=UTF-8"restart: unless-stoppednetworks:- elk-networklogging: &plugin-loggingdriver: "json-file"options:max-size: "10m"max-file: "3"tag: "plugin-manager"# Elasticsearch 服务elasticsearch:image: docker.elastic.co/elasticsearch/elasticsearch:8.9.0container_name: elasticsearchenvironment:- discovery.type=single-node- ES_JAVA_OPTS=-Xms1g -Xmx1g- bootstrap.memory_lock=true- xpack.security.enabled=false # 简化配置,生产环境应启用安全ulimits:memlock:soft: -1hard: -1volumes:- es-data:/usr/share/elasticsearch/dataports:- "9200:9200"networks:- elk-networkhealthcheck:test: ["CMD", "curl", "-f", "http://localhost:9200"]interval: 30stimeout: 10sretries: 5# Logstash 服务logstash:image: docker.elastic.co/logstash/logstash:8.9.0container_name: logstashvolumes:- ./logstash-config:/usr/share/logstash/pipelineenvironment:LS_JAVA_OPTS: "-Xmx1g -Xms1g"depends_on:- elasticsearchnetworks:- elk-networkports:- "5000:5000"healthcheck:test: ["CMD", "curl", "-f", "http://localhost:9600"]interval: 30stimeout: 10sretries: 5# Kibana 服务kibana:image: docker.elastic.co/kibana/kibana:8.9.0container_name: kibanaports:- "5601:5601"environment:ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"depends_on:- elasticsearchnetworks:- elk-networkhealthcheck:test: ["CMD", "curl", "-f", "http://localhost:5601/api/status"]interval: 30stimeout: 10sretries: 5# Filebeat 日志收集器filebeat:image: docker.elastic.co/beats/filebeat:8.9.0container_name: filebeatuser: rootvolumes:- /var/lib/docker/containers:/var/lib/docker/containers:ro- /var/run/docker.sock:/var/run/docker.sock:ro- ./filebeat-config/filebeat.yml:/usr/share/filebeat/filebeat.ymldepends_on:- elasticsearch- logstashnetworks:- elk-networkhealthcheck:test: ["CMD", "filebeat", "test", "output"]interval: 30stimeout: 10sretries: 5networks:elk-network:name: elk-network driver: bridgevolumes:es-data:driver: local
1.2 配置文件
1.2.1 Logstash 管道配置 (logstash-config/logstash.conf
)
input {beats {port => 5044}
}filter {# 解析 Docker JSON 日志json {source => "message"target => "docker"}# 解析 Java 日志if [docker][log] =~ "^\{.*\}$" {json {source => "[docker][log]"target => "log"}}# 提取日志级别grok {match => { "[log][message]" => "%{LOGLEVEL:log_level} %{GREEDYDATA:log_message}" }}# 解析时间戳date {match => [ "timestamp", "ISO8601" ]target => "@timestamp"}# 添加服务名称标签mutate {add_field => { "service" => "%{[docker][container][name]}" }}
}output {elasticsearch {hosts => ["elasticsearch:9200"]index => "logs-%{+YYYY.MM.dd}"}# 调试用:输出到控制台stdout {codec => rubydebug}
}
1.2.2 Filebeat 配置 (filebeat-config/filebeat.yml
)
filebeat.inputs:
- type: containerpaths: - '/var/lib/docker/containers/*/*.log'processors:- add_docker_metadata: ~# 只收集特定容器的日志
filebeat.autodiscover:providers:- type: dockertemplates:- condition:contains:docker.container.name: "plugin-manager"config:- type: containerpaths:- "/var/lib/docker/containers/${data.docker.container.id}/*.log"output.logstash:hosts: ["logstash:5044"]logging.level: info
logging.to_files: true
logging.files:path: /var/log/filebeatname: filebeatkeepfiles: 7
1.3 部署步骤
- 创建配置目录
mkdir -p {logstash-config,filebeat-config}
- 创建配置文件
# 创建 Logstash 配置
cat > logstash-config/logstash.conf <<EOF
[上面的 Logstash 配置内容]
EOF# 创建 Filebeat 配置
cat > filebeat-config/filebeat.yml <<EOF
[上面的 Filebeat 配置内容]
EOF
- 调整目录权限
chmod 644 filebeat-config/filebeat.yml
chmod go-w filebeat-config/filebeat.yml
- 启动所有服务
docker-compose up -d
1.4 验证部署
1.4.1 检查服务状态
docker-compose ps# 期望输出
Name Command State Ports
-------------------------------------------------------------------------------------------------
elasticsearch /bin/tini -- /usr/local/bi ... Up (healthy) 0.0.0.0:9200->9200/tcp, 9300/tcp
filebeat /usr/local/bin/docker-entr ... Up
kibana /bin/tini -- /usr/local/bi ... Up (healthy) 0.0.0.0:5601->5601/tcp
logstash /usr/local/bin/docker-entr ... Up (healthy) 0.0.0.0:5000->5000/tcp, 5044/tcp, 9600/tcp
plugin-manager java -jar /app/jars/plugi ... Up 0.0.0.0:8080->8080/tcp
1.4.2 测试日志收集
- 生成一些应用日志:
curl http://localhost:8080/actuator/health
- 检查 Elasticsearch 索引:
curl -X GET "localhost:9200/_cat/indices?v"# 期望看到类似输出
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open logs-2023.07.15 abcdefgh1234567890 1 1 5 0 25.5kb 25.5kb
1.5 Kibana 仪表板配置
-
访问 Kibana: http://localhost:5601
-
创建索引模式
- 导航到 Management > Stack Management > Kibana > Index Patterns
- 创建新索引模式:
logs-*
- 选择时间字段:
@timestamp
-
查看日志
- 导航到 Analytics > Discover
- 选择
logs-*
索引模式 - 查看和搜索日志
-
创建仪表板(可选)
- 导航到 Analytics > Dashboard > Create dashboard
- 添加可视化组件:
- 日志级别分布(饼图)
- 服务错误率(指标)
- 日志时间序列(面积图)
1.6 高级功能
1.6.1 错误告警配置
在 Kibana 中设置错误告警:
- 导航到 Management > Stack Management > Rules and Connectors
- 创建新规则:
- 条件:当日志包含 “ERROR” 时
- 操作:发送邮件或 Slack 通知
1.6.2 JVM 指标监控
在 Java 应用中添加 Micrometer 依赖:
<dependency><groupId>io.micrometer</groupId><artifactId>micrometer-registry-prometheus</artifactId><version>1.11.2</version>
</dependency>
配置 Filebeat 收集指标:
filebeat.inputs:
- type: http_endpointlisten_address: 0.0.0.0listen_port: 8081path: "/actuator/prometheus"interval: 10sfields:type: "metrics"
1.6.3 APM 集成(应用性能监控)
添加 Elastic APM:
# 在 docker-compose.yml 中添加 APM 服务
apm-server:image: docker.elastic.co/apm/apm-server:8.9.0ports:- "8200:8200"environment:- output.elasticsearch.hosts=["elasticsearch:9200"]networks:- elk-network
在 Java 应用中配置 APM:
# application.properties
elastic.apm.server_url=http://apm-server:8200
elastic.apm.service_name=plugin-manager
elastic.apm.application_packages=com.yourcompany
1.7 维护与优化
1.7.1 日志保留策略
在 Elasticsearch 中配置 ILM(索引生命周期管理):
# 创建生命周期策略
curl -X PUT "localhost:9200/_ilm/policy/logs-policy" -H 'Content-Type: application/json' -d'
{"policy": {"phases": {"hot": {"min_age": "0ms","actions": {"rollover": {"max_size": "50gb","max_age": "30d"}}},"delete": {"min_age": "60d","actions": {"delete": {}}}}}
}'
1.7.2 性能调优
-
Elasticsearch:
environment:- ES_JAVA_OPTS=-Xms4g -Xmx4g- indices.query.bool.max_clause_count=10000
-
Logstash:
# 在 logstash.conf 中添加 pipeline:workers: 4batch.size: 125
-
Filebeat:
queue:mem:events: 4096flush.min_events: 512
1.8 故障排除
1.8.1 常见问题解决
-
日志未显示在 Kibana 中
# 检查 Filebeat 日志 docker logs filebeat# 测试 Logstash 连接 docker exec filebeat filebeat test output
-
Elasticsearch 磁盘空间不足
# 清理旧索引 curl -X DELETE "localhost:9200/logs-2023.06.*"# 调整存储策略 curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' {"persistent": {"cluster.routing.allocation.disk.watermark.low": "85%","cluster.routing.allocation.disk.watermark.high": "90%"} }'
-
高负载下的性能问题
# 查看热点线程 curl "localhost:9200/_nodes/hot_threads?threads=10"# 调整索引刷新间隔 curl -X PUT "localhost:9200/logs-*/_settings" -H 'Content-Type: application/json' -d' {"index": {"refresh_interval": "30s"} }'
1.9 安全加固
-
启用 Elasticsearch 安全功能
environment:- xpack.security.enabled=true- ELASTIC_PASSWORD=YourSecurePassword
-
配置 HTTPS
# 生成证书 docker run --rm -v $(pwd)/certs:/certs elasticsearch:8.9.0 \bin/elasticsearch-certutil ca --pass secret --out /certs/elastic-stack-ca.p12# 在配置中引用证书 volumes:- ./certs:/usr/share/elasticsearch/config/certs environment:- xpack.security.http.ssl.enabled=true- xpack.security.http.ssl.keystore.path=certs/elastic-stack-ca.p12
-
网络隔离
networks:elk-network:internal: true
此 ELK Stack 部署提供了完整的日志监控解决方案,从日志收集、处理到可视化分析。您可以根据实际需求调整资源配置和保留策略,对于生产环境,强烈建议启用安全功能并配置适当的访问控制。
2 elk使用
2.1 Docker 容器日志查看(基础)
2.1.1 查看单个容器日志
# 查看 plugin-manager 容器日志
docker logs plugin-manager# 实时跟踪日志
docker logs -f --tail 100 plugin-manager# 查看最近5分钟日志
docker logs --since 5m plugin-manager
2.1.1 查看所有容器日志
# 一键查看所有容器日志
docker-compose logs -f --tail=50# 按服务名过滤
docker-compose logs -f plugin-manager
2.1.3 高级日志筛选
# 过滤包含 ERROR 的日志行
docker logs plugin-manager 2>&1 | grep ERROR# 使用 jq 解析 JSON 日志
docker logs plugin-manager --tail 100 | jq .
2.2 ELK 日志查看(生产级方案)
2.2.1 Kibana 控制台访问
-
打开浏览器访问:
http://<服务器IP>:5601
-
进入 Discover 页面
-
创建索引模式:
logs-*
(时间字段选择@timestamp
) -
使用 KQL 查询:
service : "plugin-manager" and log_level : "ERROR"
2.2.2 常用 Kibana 查询示例
查询目的 | KQL 语句 |
---|---|
特定服务 | service : "plugin-manager" |
错误日志 | log_level : "ERROR" |
时间范围 | @timestamp >= now-15m |
日志内容 | message : "TimeoutException" |
组合查询 | service : "plugin-manager" and log_level : "WARN" |
2.2.3 创建仪表板
- 进入 Dashboard → Create dashboard
- 添加可视化组件:
- 日志级别分布(饼图)
- 请求延迟百分位(直方图)
- 异常趋势(时间序列)
2.3 命令行高级诊断
2.3.1 ELK 服务状态检查
# 检查 Elasticsearch 健康状态
curl -XGET 'http://localhost:9200/_cluster/health?pretty'# 查看 Logstash 管道状态
curl -XGET 'http://localhost:9600/_node/stats/pipelines?pretty'# Filebeat 内部指标
docker exec filebeat filebeat test output
2.3.2. 日志文件直查
# 查看 Filebeat 收集的原始日志
docker exec filebeat cat /var/log/filebeat/filebeat# 检查 Logstash 处理日志
docker exec logstash tail -f /usr/share/logstash/logs/logstash-plain.log# Elasticsearch 索引内容
curl -XGET 'http://localhost:9200/logs-2023.07.15/_search?q=service:plugin-manager&pretty'
2.4 容器日志持久化方案
2.4.1. Docker 日志驱动配置
# docker-compose.yml
services:plugin-manager:logging:driver: "json-file"options:max-size: "10m"max-file: "5"tag: "plugin-manager"
2.4.2. ELK 日志收集路径
# 查看容器日志存储位置
docker inspect plugin-manager | grep LogPath# 典型路径:
/var/lib/docker/containers/<container-id>/<container-id>-json.log
2.4.3. 日志轮转配置
# 创建 logrotate 配置
sudo tee /etc/logrotate.d/docker <<EOF
/var/lib/docker/containers/*/*.log {dailyrotate 7size 100Mcompressdelaycompressmissingokcopytruncate
}
EOF
2.5 故障排查专项命令
2.5.1. 容器启动失败
# 查看最后50行启动日志
docker logs --tail 50 plugin-manager# 检查容器退出码
docker inspect plugin-manager | jq '.[0].State.ExitCode'
2.5.2. ELK 管道阻塞
# 检查 Logstash 队列
curl -s 'http://localhost:9600/_node/stats/pipelines?pretty' | jq '.pipelines.main.queue'# Filebeat 积压情况
docker exec filebeat filebeat export monitoring | jq '.filebeat.events.active'
2.5.3. 日志丢失诊断
# 检查收集点
docker exec filebeat ls -lh /var/lib/docker/containers/*/*.log# 验证 Logstash 接收
tcpdump -i any port 5044 -A | grep 'plugin-manager'
2.6 可视化增强方案
2.6.1. 安装 Kibana 插件
# 进入 Kibana 容器
docker exec -it kibana /bin/bash# 安装日志增强插件
bin/kibana-plugin install https://github.com/sivasamyk/logtrail/releases/download/v0.1.31/logtrail-7.10.0-0.1.31.zip
2.6.2. Logtrail 配置
// kibana.yml
logtrail:index_patterns: [{ pattern: 'logs-*', default: true }]search_bar: true
2.6.3. Grafana 集成
# docker-compose 添加
grafana:image: grafana/grafanaports:- "3000:3000"environment:GF_INSTALL_PLUGINS: grafana-clock-panel,grafana-simple-json-datasource
2.7 日志查看速查表
场景 | 命令 |
---|---|
实时跟踪 | docker-compose logs -f --tail=100 |
错误过滤 | docker logs plugin-manager 2>&1 | grep -A 5 -B 5 ERROR |
时间范围 | docker logs --since "2023-07-15T00:00:00" --until "2023-07-16T00:00:00" plugin-manager |
JSON 解析 | docker logs plugin-manager | jq -R 'fromjson?' |
ELK 健康检查 | curl -s 'http://localhost:9200/_cat/indices?v' |
日志文件位置 | docker inspect plugin-manager | jq -r '.[0].LogPath' |
注意事项
- 日志量控制:生产环境启用日志轮转,避免磁盘爆满
- 敏感信息:不要在日志中记录密码、密钥等敏感数据
- 权限管理:限制访问 Kibana 控制台,使用 RBAC 授权
- 性能影响:避免长期开启 DEBUG 级别日志
通过以上方法,您可以全面掌握从基础到高级的日志查看技术。对于生产环境,建议优先使用 Kibana 进行日志分析,它提供强大的搜索、过滤和可视化能力,同时保存历史记录便于回溯分析。
2.8 其他容器服务接入elk中
services:plugin-manager:image: plugin-manager:1.0container_name: plugin-manager-servicerestart: unless-stopped # 异常退出时自动重启ports:- "8088:8088" # 主机端口:容器端口environment:- TZ=Asia/Shanghai # 时区设置- JAVA_OPTS=-Xmx512m -Xms256m -Dfile.encoding=UTF-8 # JVM参数- SPRING_PROFILES_ACTIVE=prod # Spring Profilevolumes:- ./logs/:/app/logs # 日志持久化- ./config:/app/config:ro # 配置文件目录(只读)- ./jars:/app/jars # JAR 动态加载目录- ./file:/app/filecommand: ["java", "-jar", "/app/jars/plugin-manager-1.0.0.jar"]networks:- elk-networkdeploy: # 资源限制(生产环境必需)resources:limits:cpus: '1.0'memory: 768Mreservations:memory: 512M# 接入elk 日志logging:driver: "json-file"options:max-size: "10m"max-file: "5"tag: "plugin-manager"# 引用elk网络
networks:elk-network:name: elk-network # 绑定到实际存在的网络external: true # 声明使用外部网络# 卷配置(持久化存储)
volumes:plugin-manager-logs:driver: local