Jaeger:开源的端到端分布式跟踪,监视复杂的分布式系统中的事务并进行故障排除。 下图对比了常用的开源全链路追踪方案,目前SkyWalking和Pinpoint使用比较多,Jaeger相比客户端支持语言比较多,特别是对C++的支持,所以这次选择测试下。
Agent 5775 UDP协议,接收兼容zipkin的协议数据 6831 UDP协议,接收兼容jaeger的兼容协议 6832 UDP协议,接收jaeger的二进制协议 5778 HTTP协议,数据量大不建议使用
Collector 14267 tcp agent发送jaeger.thrift格式数据 14250 tcp agent发送proto格式数据(背后gRPC) 14268 http 直接接受客户端数据 14269 http 健康检查
Query 16686 http jaeger的前端,放给用户的接口 16687 http 健康检查
1.创建命名空间
[root@VM-0-123-centos jaeger]# kubectl create namespace jaeger
2.部署Jaeger-Operator Jaeger Operator:Jaeger Operator for Kubernetes简化了在Kubernetes上的部署和运行Jaeger。 Jaeger Operator是Kubernetes operator的实现。操作员是一种软件,可以减轻运行另一软件的操作复杂性。从技术上讲,操作员是打包,部署和管理Kubernetes应用程序的一种方法。 Jaeger Operator版本跟踪Jaeger组件(查询,收集器,代理)的一种版本。发行新版本的Jaeger组件时,将发行新版本的操作员,该操作员了解如何将先前版本的运行实例升级到新版本。
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml
查看状态
[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-6ff67bdd4b-4nffk 1/1 Running 0 14d
pod/simple-prod-collector-59fc47bf5c-h26mq 0/1 Terminating 0 9d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 172.20.253.138 <none> 8383/TCP,8686/TCP 14d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 14d
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-6ff67bdd4b 1 1 1 14d
3.创建jaeger实例 创建jaeger.yaml文件,配置ES集群及限制Deployment/simple-prod-collector容器的cpu和内存使用大小。最大数量可以起10个pod。
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://10.0.16.3:9200
index-prefix: zhjt
collector:
maxReplicas: 10
resources:
limits:
cpu: 500m
memory: 512Mi
[root@VM-0-123-centos jaeger]# kubectl apply -f jaeger.yaml -n jaeger
jaeger.jaegertracing.io/simple-prod created
列出jaeger对象 备注:貌似使用官网all in one的例子状态是正常的Running,这里状态虽然是Failed,但是不影响使用。
[root@VM-0-123-centos jaeger]# kubectl get jaegers -n jaeger
NAME STATUS VERSION STRATEGY STORAGE AGE
simple-prod Failed 1.22.0 production elasticsearch 9d
获取pod名字
[root@VM-0-123-centos jaeger]# kubectl get pods -l app.kubernetes.io/instance=simple-prod -n jaeger
NAME READY STATUS RESTARTS AGE
simple-prod-collector-59fc47bf5c-h26mq 1/1 Running 0 9d
simple-prod-query-85689b7bbd-g5jw9 2/2 Running 0 9d
获取pod日志
[root@VM-0-123-centos jaeger]# kubectl logs simple-prod-query-85689b7bbd-g5jw9 jaeger-agent -n jaeger
2021/04/28 04:55:34 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585734.2081811,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585734.2082183,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585734.2083232,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585734.2083883,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14271"}
{"level":"info","ts":1619585734.2084124,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14271","health-status":"unavailable"}
{"level":"info","ts":1619585734.2089527,"caller":"grpc/builder.go:70","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1619585734.2089992,"caller":"grpc@v1.29.1/clientconn.go:243","msg":"parsed scheme: \"dns\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.21038,"caller":"command-line-arguments/main.go:84","msg":"Starting agent"}
{"level":"info","ts":1619585734.2104166,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585734.2108943,"caller":"grpc/builder.go:108","msg":"Checking connection to collector"}
{"level":"info","ts":1619585734.210908,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"IDLE"}
{"level":"info","ts":1619585734.211061,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1619585734.3344934,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.88:14250 <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345578,"caller":"grpc@v1.29.1/clientconn.go:667","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345697,"caller":"grpc@v1.29.1/clientconn.go:682","msg":"Channel switches to new LB policy \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3346283,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.33467,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.334736,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3347983,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619585734.335669,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357751,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002f5ea0:{{172.20.0.88:14250 <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357947,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.335807,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
{"level":"info","ts":1619592172.4516647,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517512,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517596,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517772,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517884,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"warn","ts":1619592172.4523218,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {172.20.0.88:14250 <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.20.0.88:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523551,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.452386,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523947,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1619592172.6118224,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.178:14250 <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118581,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118758,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.611892,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6119003,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619592172.6119049,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.178:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.612726,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127572,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0003df970:{{172.20.0.178:14250 <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127682,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127849,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
[root@VM-0-123-centos jaeger]# kubectl logs simple-prod-query-85689b7bbd-g5jw9 jaeger-query -n jaeger
2021/04/28 04:55:29 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585729.8951077,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585729.8951416,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585729.8952546,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585729.8953054,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":16687"}
{"level":"info","ts":1619585729.8953238,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}
{"level":"info","ts":1619585729.9169888,"caller":"config/config.go:183","msg":"Elasticsearch detected","version":7}
{"level":"info","ts":1619585729.9174955,"caller":"app/static_handler.go:181","msg":"UI config path not provided, config file will not be watched"}
{"level":"info","ts":1619585729.9175768,"caller":"app/server.go:170","msg":"Query server started"}
{"level":"info","ts":1619585729.9175944,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585729.9176183,"caller":"app/server.go:249","msg":"Starting GRPC server","port":16685,"addr":":16685"}
{"level":"info","ts":1619585729.9176335,"caller":"app/server.go:230","msg":"Starting HTTP server","port":16686,"addr":":16686"}
4.查看jaeger资源
[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-6ff67bdd4b-4nffk 1/1 Running 0 14d
pod/simple-prod-collector-59fc47bf5c-h26mq 1/1 Running 0 8d
pod/simple-prod-query-85689b7bbd-g5jw9 2/2 Running 0 8d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 172.20.253.138 <none> 8383/TCP,8686/TCP 14d
service/simple-prod-collector ClusterIP 172.20.255.184 <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 8d
service/simple-prod-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 8d
service/simple-prod-query ClusterIP 172.20.254.102 <none> 16686/TCP 8d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 14d
deployment.apps/simple-prod-collector 1/1 1 1 8d
deployment.apps/simple-prod-query 1/1 1 1 8d
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-6ff67bdd4b 1 1 1 14d
replicaset.apps/simple-prod-collector-59fc47bf5c 1 1 1 8d
replicaset.apps/simple-prod-query-85689b7bbd 1 1 1 8d
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/simple-prod-collector Deployment/simple-prod-collector 1457m/90, 137m/90 1 10 1 8d
如果流量大需要减小es压力,可以接入kafka集群,修改jaeger.yaml文件
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-streaming
spec:
strategy: streaming
collector:
options:
kafka:
producer:
topic: jaeger-spans
brokers: my-cluster-kafka-brokers.kafka:9092 #修改为kafka地址
ingester:
options:
kafka:
consumer:
topic: jaeger-spans
brokers: my-cluster-kafka-brokers.kafka:9092 #修改为kafka地址
ingester:
deadlockInterval: 5s
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200 #修改为ES地址
5.agent部署
jaeger client的一个代理程序,client将收集到的调用链数据发给agent,然后由agent发给collector。由于使用的udp协议,一般部署在靠近client的位置。
agent有多种安装方式
1).docker安装
下载:jaegertracing/jaeger-agent Tags (docker.com)
docker run -d -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778/tcp jaegertracing/jaeger-agent:1.12 --reporter.grpc.host-port=xx.xx.xx.xx:14250
2).k8s安装又分两种
sidecar方式
daemonset方式
参考:Operator for Kubernetes — Jaeger documentation (jaegertracing.io)
3).二进制安装
下载:Jaeger – Download Jaeger (jaegertracing.io)
nohup ./jaeger-agent --collector.host-port=xxxx:14267 1>1.log 2>2.log &