k8s 探针 livenessProbe 和 readinessProbe 必须不一样

原创

sir5kong

发布于 2023-05-24 20:53:52

6660

发布于 2023-05-24 20:53:52

livenessProbe: 存活探针
readinessProbe: 就绪探针

简单来说 livenessProbe 能够起到存活检测和自动重启的的效果，readinessProbe 用于管理 Pod 状态并影响 Kubernetes Service 流量分配。当 readinessProbe 检测失败，容器所在 Pod 上报未就绪状态，并且从 Service 断开流量。

## 探针版本一 (反面案例)
livenessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10

首先提供一个反面案例，上面版本一两种探针的参数一模一样，这是有问题的。假如 /health 不通，两种探针会同时失败，会直接触发重启，readinessProbe 起不到保护的作用。

如果把 livenessProbe.periodSeconds 检测周期调大一点，像下面版本二这样。当 Pod 过载时 /health 调不通，readinessProbe 会先失败，此时 Pod 不再负载流量可能会很快缓过来，/health 恢复后就不会触发重启。对于 7*24 小时在线的业务，这点差异很重要。

## 探针版本二
livenessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 15   ## 默认值 10
readinessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10   ## 默认值 10

死亡重启问题

## 探针版本一
livenessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3  ## 默认值 3

上面探针版本一如果容器启动时间大于 40 秒就会发生死亡重启。因为容器启动时探针会初始等待 10 秒，然后连续 3 次存活探针检测失败会触发重启，每次容器还没完成启动就会触发重启。

## 探针版本三
livenessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 180
  periodSeconds: 10    ## 默认值 10
  failureThreshold: 5  ## 默认值 3
readinessProbe:
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 5     ## 默认值 10
  failureThreshold: 3  ## 默认值 3

上面探针版本三是再次优化过的，把存活探针 initialDelaySeconds 和 failureThreshold 调大，就绪探针 periodSeconds 调小了。这样的配置适合启动比较慢的容器，比如启动时间在 1-2 分钟。

另外 startupProbe 也可以解决死亡重启问题，配合存活探针会更得心应手。

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

kubernetes