Prometheus Server的数据抓取工作于Pull模型,因而,它必需要事先知道各Target的位置,然后才能从相应的Exporter或Instrumentation中抓取数据, 对于小型系统来说,通过static_configs就可以解决此问题,这也是最简单的配置方法;对于中大型系统环境或具有较强动态性的云计算环境来说,静态配置显然难以适用,因此,Prometheus为此专门设计了一组服务发现机制,以便能够通过服务注册中心自动发现、检测、分类可被检测的各target,以及更新发生了变动的target。
发现 -> 配置 -> relabel -> 指标数据抓取 -> metrics relabel
此种类型也是最简单的服务发现方式,主要是通过Prometheus Server定期从文件中加载target的信息。文件可以是json或者yaml格式,它含有定义的target列表,以及可选的标签信息。
vi prometheus.yml
# static config nodes
- job_name: 'nodes'
file_sd_configs:
- files:
- targets/nodes-*.yaml
refresh_interval: 2m
scrape_interval: 15s
然后将所有要发现的target全部放在targets/目录下即可,例如
cat targets/nodes-linux.yaml
- targets:
- monitor.example.com:9100
- node.export1.com:9101
- node.export2.com:9101
- node.export3.com:9101
labels:
app: node-exporter
os: aliyunos3
cat targets/nodes-prometheus.yaml
- targets:
- monitor.example.com:9090
labels:
app: prometheus
job: prometheus
重新加载Prometheus配置即可:
curl -XPOST monitor.example.com:9090/-/reload
consul是一款基于golang开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的服务,提供服务注册/发现、健康检查、Key/Value存储、多数据中心和分布式一致性保证等功能。
多种部署方式,这里仅是使用consul的功能,并不考虑高可用或其他问题,采用docker-compose方式部署。
vi docker-compose.yml
version: '3.6'
volumes:
consul_data: {}
networks:
monitoring:
driver: bridge
services:
consul:
image: consul:1.14
volumes:
- ./consul_configs:/consul/config
- consul_data:/consul/data/
networks:
- monitoring
ports:
- 8500:8500
command: ["consul","agent","-dev","-bootstrap","-config-dir","/consul/config","-data-dir","/consul/data","-ui","-log-level","INFO","-bind","127.0.0.1","-client","0.0.0.0"]
consul-exporter:
image: prom/consul-exporter:v0.8.0
networks:
- monitoring
ports:
- 9107:9107
command:
- "--consul.server=consul:8500"
depends_on:
- consul
这里顺便把consul-exporter也部署了 直接启动:
# docker-compose up -d
# docker-compose ps
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
consul-and-exporter-consul-1 consul:1.14 "docker-entrypoint.s…" consul 24 hours ago Up 24 hours 8300-8302/tcp, 8301-8302/udp, 8600/tcp, 8600/udp, 0.0.0.0:8500->8500/tcp, :::8500->8500/tcp
consul-and-exporter-consul-exporter-1 prom/consul-exporter:v0.8.0 "/bin/consul_exporte…" consul-exporter 24 hours ago Up 24 hours 0.0.0.0:9107->9107/tcp, :::9107->9107/tcp
可以通过ip:8500直接访问consul,这里示例并没有设置token,正常生产环境需要token来进行身份验证:
需要注意,使用consul自动发现时,需要在job中通过标签来匹配对应的target,例如:
vi prometheus.yml
# consul_service_discovery
- job_name: 'nodes'
consul_sd_configs:
- server: "monitor.example.com:8500"
tags:
- "nodes" # 匹配在consul注册的服务中带有nodes标签的service
refresh_interval: 2m
scrape_interval: 15s
- job_name: 'grafana'
consul_sd_configs:
- server: "monitor.example.com:8500"
tags:
- "grafana" # 匹配在consul注册的服务中带有grafana标签的service
refresh_interval: 2m
scrape_interval: 15s
重新加载:
curl -XPOST monitor.example.com:9090/-/reload
服务注册到consul有两种方式,一种是使用consul客户端命令进行操作,另一种是通过api操作。api方式注册演示 准备json文件
vi grafana.json
{
"ID": "grafana",
"Name": "grafana",
"Tags": ["grafana", "v9"], # 包含的标签
"Address": "monitor.example.com",
"Port": 3000,
"Meta": {
"grafana_version": "9" # 元数据,可自定义
},
"EnableTagOverride": false,
"Check": { # 检查健康状态的方法
"http": "http://monitor.example.com:3000/metrics",
"interval": "5s",
"Timeout": "5s"
},
"Weights": {
"Passing": 1,
"Warning": 1
}
}
健康检查方法也可以是执行脚本,例如:
"Check": { "DeregisterCriticalServiceAfter": "90m", "Args": ["/usr/local/bin/check_redis.py"], "Interval": "10s", "Timeout": "5s" },
注册服务:
curl -XPUT --data @grafana.json http://monitor.example.com:8500/v1/agent/service/register
查看状态:
通过consul自动发现的target会有很多__meta_consul开头的标签,我们可以通过relabel来重新利用这些标签,这个下篇笔记总结。常用的 api 指令:
curl http://monitor.example.com:8500/v1/agent/services
curl http://monitor.example.com:8500/v1/agent/health/service/name/tomcat
curl -XPUT --data @grafana.json http://monitor.example.com:8500/v1/agent/service/register
curl -XPUT http://monitor.example.com:8500/v1/agent/service/deregister/grafana
官网文档:https://developer.hashicorp.com/consul/api-docs/agent/service
准备nodes.json文件,同一类型的target可以写到一个json文件中,便于编辑注册
{
"services": [
{
"id": "node.export1.com",
"name": "node.export1.com",
"address": "node.export1.com",
"port": 9101,
"tags": ["nodes"],
"checks": [{
"http": "http://node.export1.com:9101/metrics",
"interval": "5s"
}]
},
{
"id": "node.export2.com",
"name": "node.export2.com",
"address": "node.export2.com",
"port": 9101,
"tags": ["nodes"],
"checks": [{
"http": "http://node.export2.com:9101/metrics",
"interval": "5s"
}]
},
{
"id": "node.export3.com",
"name": "node.export3.com",
"address": "node.export3.com",
"port": 9101,
"tags": ["nodes"],
"checks": [{
"http": "http://node.export3.com:9101/metrics",
"interval": "5s"
}]
},
{
"id": "monitor.example.com",
"name": "monitor.example.com",
"address": "monitor.example.com",
"port": 9100,
"tags": ["nodes"],
"checks": [{
"http": "http://monitor.example.com:9100/metrics",
"interval": "5s"
}]
}
]
}
将node.json文件放置到consul服务启动的"-data-dir"目录下,此示例为容器内/consul/data
/consul/config # pwd
/consul/config
/consul/config # ls
nodes.json
执行config重新加载
consul reload
Configuration reload triggered
查看consul及Prometheus状态
至此,Prometheus基于consul的自动发现基本演示完毕。