一、Prometheus介绍 之前已经详细介绍了Kubernetes集群部署篇,今天这里重点说下Kubernetes监控方案-Prometheus+Grafana。Prometheus(普罗米修斯)是一个开源系统监控和警报工具,最初是在SoundCloud建立的。自2012年成立以来,许多公司和组织都采用了普罗米修斯,该项目拥有一个非常活跃的开发者和用户社区。它现在是一个独立的开放源码项目,并且独立于任何公司,为了强调该点并澄清项目的治理结构,Prometheus在2016年加入了云计算基金会,成为继Kubernetes之后的第二个托管项目。 Prometheus是用来收集数据的,同时本身也提供强大的查询能力,结合Grafana即可以监控并展示出想要的数据。
Prometheus的主要特征 -> 多维度数据模型 -> 灵活的查询语言 (PromQL) -> 不依赖分布式存储,单个服务器节点是自主的 -> 以HTTP方式,通过pull模型拉去时间序列数据 -> 也通过中间网关支持push模型 -> 通过服务发现或者静态配置,来发现目标服务对象 -> 支持多种多样的图表和界面展示,grafana也支持它
Prometheus组件 Prometheus生态包括了很多组件,它们中的一些是可选的: -> 主服务Prometheus Server负责抓取和存储时间序列数据 -> 客户库负责检测应用程序代码 -> 支持短生命周期的PUSH网关 -> 基于Rails/SQL仪表盘构建器的GUI -> 多种导出工具,可以支持Prometheus存储数据转化为HAProxy、StatsD、Graphite等工具所需要的数据存储格式 -> 警告管理器 (AlertManaager) -> 命令行查询工具 -> 其他各种支撑工具
Prometheus监控Kubernetes集群过程中,通常情况为: -> 使用metric-server收集数据给k8s集群内使用,如kubectl,hpa,scheduler等 -> 使用prometheus-operator部署prometheus,存储监控数据 -> 使用kube-state-metrics收集k8s集群内资源对象数据 -> 使用node_exporter收集集群中各节点的数据 -> 使用prometheus收集apiserver,scheduler,controller-manager,kubelet组件数据 -> 使用alertmanager实现监控报警 -> 使用grafana实现数据可视化
Prometheus架构 下面这张图说明了Prometheus的整体架构,以及生态中的一些组件作用
Prometheus整体流程比较简单,Prometheus 直接接收或者通过中间的 Pushgateway 网关被动获取指标数据,在本地存储所有的获取的指标数据,并对这些数据进行一些规则整理,用来生成一些聚合数据或者报警信息,Grafana 或者其他工具用来可视化这些数据。
Prometheus服务可以直接通过目标拉取数据,或者间接地通过中间网关拉取数据。它在本地存储抓取的所有数据,并通过一定规则进行清理和整理数据,并把得到的结果存储到新的时间序列中,PromQL和其他API可视化展示收集的数据在K8s中,关于集群的资源有metrics度量值的概念,有各种不同的exporter可以通过api接口对外提供各种度量值的及时数据,prometheus在与k8s融合工作的过程中就是通过与这些提供metric值的exporter进行交互,获取数据,整合数据,展示数据,触发告警的过程。
1)Prometheus获取metrics -> 对短暂生命周期的任务,采取拉的形式获取metrics (不常见) -> 对于exporter提供的metrics,采取拉的方式获取metrics(通常方式),对接的exporter常见的有:kube-apiserver 、cadvisor、node-exporter,也可根据应用类型部署相应的exporter,获取该应用的状态信息,目前支持的应用有:nginx/haproxy/mysql/redis/memcache等。
2)Prometheus数据汇总及按需获取 可以按照官方定义的expr表达式格式,以及PromQL语法对相应的指标进程过滤,数据展示及图形展示。不过自带的webui较为简陋,但prometheus同时提供获取数据的api,grafana可通过api获取prometheus数据源,来绘制更精细的图形效果用以展示。 expr书写格式及语法参考官方文档:https://prometheus.io/docs/prometheus/latest/querying/basics/
3)Prometheus告警推送 prometheus支持多种告警媒介,对满足条件的告警自动触发告警,并可对告警的发送规则进行定制,例如重复间隔、路由等,可以实现非常灵活的告警触发。
Prometheus适用场景 Prometheus在记录纯数字时间序列方面表现非常好。它既适用于面向服务器等硬件指标的监控,也适用于高动态的面向服务架构的监控。对于现在流行的微服务,Prometheus的多维度数据收集和数据筛选查询语言也是非常的强大。Prometheus是为服务的可靠性而设计的,当服务出现故障时,它可以使你快速定位和诊断问题。它的搭建过程对硬件和服务没有很强的依赖关系。
Prometheus不适用场景 Prometheus,它的价值在于可靠性,甚至在很恶劣的环境下,你都可以随时访问它和查看系统服务各种指标的统计信息。 如果你对统计数据需要100%的精确,它并不适用,例如:它不适用于实时计费系统
二、Prometheus+Grafana部署 依据之前部署好的Kubernetes容器集群管理环境为基础,继续部署Prometheus+Grafana,记录如下:(k8s-prometheus-grafana.git打包后下载地址:https://pan.baidu.com/s/1nb-QCOc7lgmyJaWwPRBjPg 提取密码: bh2e)
1)在k8s-master01节点上进行安装部署。安装git,并下载相关yaml文件
[root@k8s-master01 ~]# cd /opt/k8s/work/
[root@k8s-master01 work]# git clone https://github.com/redhatxl/k8s-prometheus-grafana.git
2)在所有的node节点下载监控所需镜像
[root@k8s-master01 work]# source /opt/k8s/bin/environment.sh
[root@k8s-master01 work]# for node_node_ip in ${NODE_NODE_IPS[@]}
do
echo ">>> ${node_node_ip}"
ssh root@${node_node_ip} "docker pull prom/node-exporter"
done
[root@k8s-master01 work]# for node_node_ip in ${NODE_NODE_IPS[@]}
do
echo ">>> ${node_node_ip}"
ssh root@${node_node_ip} "docker pull prom/prometheus:v2.0.0"
done
[root@k8s-master01 work]# for node_node_ip in ${NODE_NODE_IPS[@]}
do
echo ">>> ${node_node_ip}"
ssh root@${node_node_ip} "docker pull grafana/grafana:4.2.0"
done
3)采用daemonset方式部署node-exporter组件
[root@k8s-master01 work]# cd k8s-prometheus-grafana/
[root@k8s-master01 k8s-prometheus-grafana]# ls
grafana node-exporter.yaml prometheus README.md
[root@k8s-master01 k8s-prometheus-grafana]# kubectl create -f node-exporter.yaml
稍等一会儿,查看node-exporter部署是否成功了
[root@k8s-master01 k8s-prometheus-grafana]# kubectl get pods -n kube-system|grep "node-exporter*"
node-exporter-9c2hc 1/1 Running 0 91s
node-exporter-bvdwv 1/1 Running 0 91s
node-exporter-d4vw4 1/1 Running 0 91s
4)部署prometheus组件
[root@k8s-master01 k8s-prometheus-grafana]# cd prometheus/
4.1)部署rbac文件
[root@k8s-master01 prometheus]# kubectl create -f rbac-setup.yaml
4.2)以configmap的形式管理prometheus组件的配置文件
[root@k8s-master01 prometheus]# kubectl create -f configmap.yaml
4.3)Prometheus deployment 文件
[root@k8s-master01 prometheus]# kubectl create -f prometheus.deploy.yml
4.4)Prometheus service文件
[root@k8s-master01 prometheus]# kubectl create -f prometheus.svc.yml
5)部署grafana组件
[root@k8s-master01 prometheus]# cd ../grafana/
[root@k8s-master01 grafana]# ll
total 12
-rw-r--r-- 1 root root 1449 Jul 8 17:19 grafana-deploy.yaml
-rw-r--r-- 1 root root 256 Jul 8 17:19 grafana-ing.yaml
-rw-r--r-- 1 root root 225 Jul 8 17:19 grafana-svc.yaml
5.1)grafana deployment配置文件
[root@k8s-master01 grafana]# kubectl create -f grafana-deploy.yaml
5.2)grafana service配置文件
[root@k8s-master01 grafana]# kubectl create -f grafana-svc.yaml
5.3)grafana ingress配置文件
[root@k8s-master01 grafana]# kubectl create -f grafana-ing.yaml
6)web访问界面配置
[root@k8s-master01 grafana]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
........... .... .... ... ...
grafana-core-5f7c6c786b-x8prc 1/1 Running 0 105s
........... .... .... ... ...
node-exporter-9c2hc 1/1 Running 0 10m
node-exporter-bvdwv 1/1 Running 0 10m
node-exporter-d4vw4 1/1 Running 0 10m
prometheus-6b96dcbd87-lwwv7 1/1 Running 0 3m11s
[root@k8s-master01 grafana]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana NodePort 10.254.95.120 <none> 3000:31821/TCP 2m30s
.......... .... .... .... .... ....
.......... .... .... .... .... ....
node-exporter NodePort 10.254.113.30 <none> 9100:31672/TCP 11m
prometheus NodePort 10.254.84.139 <none> 9090:30003/TCP 4m6s
7)查看node-exporter (http://node-ip:31672/)
http://172.16.60.244:31672/
http://172.16.60.245:31672/
http://172.16.60.246:31672/
8)prometheus对应的nodeport端口为30003,通过访问http://node-ip:30003/targets 可以看到prometheus已经成功连接上了k8s的apiserver http://172.16.60.244:30003/targets http://172.16.60.245:30003/targets http://172.16.60.246:30003/targets
9)通过端口进行granfa访问,默认用户名密码均为admin [root@k8s-master01 grafana]# kubectl get svc -n kube-system|grep "grafana" grafana NodePort 10.254.95.120 <none> 3000:31821/TCP 12m
访问grafana的地址为: http://172.16.60.244:31821/ http://172.16.60.245:31821/ http://172.16.60.246:31821/
10)为grafana添加数据源
导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,面板模板下载地址https:///dashboards/315
查看Grafana的展示效果
三、kubernetes集群管理测试
[root@k8s-master01 ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node01 Ready <none> 20d v1.14.2
k8s-node02 Ready <none> 20d v1.14.2
k8s-node03 Ready <none> 20d v1.14.2
部署测试实例
[root@k8s-master01 ~]# kubectl run kevin-nginx --image=nginx --replicas=3
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/kevin-nginx created
[root@k8s-master01 ~]# kubectl run --generator=run-pod/v1 kevin-nginx --image=nginx --replicas=3
pod/kevin-nginx created
稍等一会儿,查看创建的kevin-nginx的pod(由于创建时要自动下载nginx镜像,所以需要等待一段时间)
[root@k8s-master01 ~]# kubectl get pods --all-namespaces|grep "kevin-nginx"
default kevin-nginx 1/1 Running 0 98s
default kevin-nginx-569dcd559b-6h4nn 1/1 Running 0 106s
default kevin-nginx-569dcd559b-7f2b4 1/1 Running 0 106s
default kevin-nginx-569dcd559b-7tds2 1/1 Running 0 106s
查看具体详细事件
[root@k8s-master01 ~]# kubectl get pods --all-namespaces -o wide|grep "kevin-nginx"
default kevin-nginx 1/1 Running 0 2m13s 172.30.72.12 k8s-node03 <none> <none>
default kevin-nginx-569dcd559b-6h4nn 1/1 Running 0 2m21s 172.30.56.7 k8s-node02 <none> <none>
default kevin-nginx-569dcd559b-7f2b4 1/1 Running 0 2m21s 172.30.72.11 k8s-node03 <none> <none>
default kevin-nginx-569dcd559b-7tds2 1/1 Running 0 2m21s 172.30.88.8 k8s-node01 <none> <none>
[root@k8s-master01 ~]# kubectl get deployment|grep kevin-nginx
kevin-nginx 3/3 3 3 2m57s
创建svc
[root@k8s-master01 ~]# kubectl expose deployment kevin-nginx --port=8080 --target-port=80 --type=NodePort
[root@k8s-master01 ~]# kubectl get svc|grep kevin-nginx
nginx NodePort 10.254.111.50 <none> 8080:32177/TCP 33s
集群内部,各pod之间访问kevin-nginx
[root@k8s-master01 ~]# curl http://10.254.111.50:8080
外部访问kevin-nginx的地址为http://node_ip/32177
http://172.16.60.244:32177
http://172.16.60.245:32177
http://172.16.60.246:32177
四、部署kubernetes集群的web-ui
1)配置kubernetes-dashboard.yaml (里面的"k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1"镜像已经提前在node节点上下载了)
[root@k8s-master01 ~]# cd /opt/k8s/work/
[root@k8s-master01 work]# cat kubernetes-dashboard.yaml
# ------------------- Dashboard Secret ------------------- #
apiVersion: v1
kind: Secret
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard-certs
namespace: kube-system
type: Opaque
---
# ------------------- Dashboard Service Account ------------------- #
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
---
# ------------------- Dashboard Role & Role Binding ------------------- #
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: kubernetes-dashboard-minimal
namespace: kube-system
rules:
# Allow Dashboard to create 'kubernetes-dashboard-key-holder' secret.
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create"]
# Allow Dashboard to create 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
# Allow Dashboard to get, update and delete Dashboard exclusive secrets.
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs"]
verbs: ["get", "update", "delete"]
# Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["kubernetes-dashboard-settings"]
verbs: ["get", "update"]
# Allow Dashboard to get metrics from heapster.
- apiGroups: [""]
resources: ["services"]
resourceNames: ["heapster"]
verbs: ["proxy"]
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames: ["heapster", "http:heapster:", "https:heapster:"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubernetes-dashboard-minimal
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kubernetes-dashboard-minimal
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: kubernetes-dashboard
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
---
# ------------------- Dashboard Deployment ------------------- #
kind: Deployment
apiVersion: apps/v1beta2
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kubernetes-dashboard
template:
metadata:
labels:
k8s-app: kubernetes-dashboard
spec:
serviceAccountName: kubernetes-dashboard-admin
containers:
- name: kubernetes-dashboard
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1
ports:
- containerPort: 9090
protocol: TCP
args:
#- --auto-generate-certificates
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
#- --apiserver-host=http://10.0.1.168:8080
volumeMounts:
- name: kubernetes-dashboard-certs
mountPath: /certs
# Create on-disk volume to store exec logs
- mountPath: /tmp
name: tmp-volume
livenessProbe:
httpGet:
scheme: HTTP
path: /
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
volumes:
- name: kubernetes-dashboard-certs
secret:
secretName: kubernetes-dashboard-certs
- name: tmp-volume
emptyDir: {}
serviceAccountName: kubernetes-dashboard
# Comment the following tolerations if Dashboard must not be deployed on master
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
---
# ------------------- Dashboard Service ------------------- #
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
ports:
- port: 9090
targetPort: 9090
selector:
k8s-app: kubernetes-dashboard
# ------------------------------------------------------------
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard-external
namespace: kube-system
spec:
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
type: NodePort
selector:
k8s-app: kubernetes-dashboard
创建这个yaml文件
[root@k8s-master01 work]# kubectl create -f kubernetes-dashboard.yaml
稍微等一会儿,查看kubernetes-dashboard的pod创建情况(如下可知,该pod落在了k8s-node03节点上,即172.16.60.246)
[root@k8s-master01 work]# kubectl get pods -n kube-system -o wide|grep "kubernetes-dashboard"
kubernetes-dashboard-7976c5cb9c-q7z2w 1/1 Running 0 10m 172.30.72.6 k8s-node03 <none> <none>
[root@k8s-master01 work]# kubectl get svc -n kube-system|grep "kubernetes-dashboard"
kubernetes-dashboard-external NodePort 10.254.227.142 <none> 9090:30090/TCP 10m