通过监控虚拟机状态,虚拟机宕机之后,发送告警邮件,这样一个小案例,将 Prometheus 的入门使用给记录下来。
# 安装 Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.25.0/prometheus-2.25.0.linux-amd64.tar.gz
tar xf prometheus-2.25.0.linux-amd64.tar.gz -C /usr/local
cd /usr/local
mv prometheus-2.25.0.linux-amd64/ prometheus
# 启动 Prometheus
cd /usr/lib/systemd/system
vim prometheus.service
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.listen-address=:9090
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus
systemctl status prometheus
# 访问 Prometheus Web UI
http://178.104.163.109:9090
http://178.104.163.109:9090/metrics
# 安装 Grafana
wget https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm/grafana-9.3.2-1.x86_64.rpm
yum install initscripts fontconfig
yum install -y grafana-7.4.3-1.x86_64.rpm
# 启动 Grafana
systemctl start grafana-server.service
systemctl status grafana-server.service
systemctl enable grafana-server.service
# 访问 Grafana Web UI
http://178.104.163.109:3000/login
admin / admin
[root@desktop-a853 ~]# cat /usr/local/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090','178.104.163.105:9100']
导入 node-exporter Grafana Dashboard 。
# 设置告警规则匹配目录
vi prometheus.yml
rule_files:
- "rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets: # 这里指定将告警发送到那里,发送到alertmanager
- 192.168.1.20:9093 # alertmanager 地址
# 添加告警规则
vi ./rules/node_rule.yml
groups:
- name: node-up
rules:
- alert: node-up
expr: up{job="node"} == 0
for: 10s
labels:
severity: 1
team: node
annotations:
summary: "已停止运行超过 15s"
description: hello world
# 重启 Prometheus
systemctl restart prometheus
# 安装 alertmanager
tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz
# 拷贝并赋权
install -m 0755 alertmanager-0.21.0.linux-amd64/{alertmanager,amtool} /usr/bin
# 添加 alertmanager.yml 配置文件
cat >> alertmanager.yml <<EOF
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25' # 邮箱smtp服务器代理
smtp_from: 'demo*@163.com' # 发送邮箱名称
smtp_auth_username: 'demo*@163.com' # 邮箱名称
smtp_auth_password: 'QNHPB***XBRMWCB' # 邮箱密码或授权码
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'mail'
receivers:
- name: 'mail'
email_configs:
- to: '*@*.com'
EOF
# 移动文件并设置权限
install -m 0644 -D alertmanager.yml /etc/alertmanager/alertmanager.yml
# 设置 systemctld
cat > alertmanager.service <<EOF
[Unit]
Description=Alertmanager handles alerts sent by client applications such as the Prometheus server.
Documentation=https://prometheus.io/docs/alerting/alertmanager/
After=network.target
[Service]
User=root
ExecStart=/usr/bin/alertmanager \\
--config.file=/etc/alertmanager/alertmanager.yml \\
--storage.path=/var/lib/alertmanager \\
--cluster.advertise-address=0.0.0.0:9093
ExecReload=/bin/kill -HUP
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
# 移动文件,并设置权限
install -m 0644 alertmanager.service /etc/systemd/system
# 启动服务
systemctl daemon-reload
systemctl start alertmanager
systemctl status alertmanager
systemctl enable alertmanager
# 访问 AlertManager
http://178.104.163.109:9093
将监控的虚拟机关机或者将虚拟机中的 node-exporter 关闭就可以触发邮件告警通知了。
有了这样一个基础环境,以后学习 Prometheus 相关的功能,就可以在这个环境中继续尝试了。
无论新学什么技术,先将一个 MVP 环境构建出来,似乎都是必不可少的。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有