Kubernetes 所有 Pod 同时停机原因分析

Kubernetes Debug Actuator About 5,290 words

原因分析

Kubernetes部署了Spring Boot应用,结合Spring Boot Actuator进行健康检查。

Kubernetes 配置

配置了/actuator/health健康检查节点。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy-k8s-java
  namespace: z-blog
spec:
  replicas: 2
  selector:
    matchLabels:
      app: pod-k8s-java
  template:
    metadata:
      labels:
        app: pod-k8s-java
    spec:
      containers:
        - command:
            - java
            - -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
            - -jar
            - app.jar
          name: container-k8s-java
          image: my-test-app
          imagePullPolicy: Never
          ports:
          - containerPort: 8080
            name: api
            protocol: TCP
          - containerPort: 5005
            name: remote-debug
            protocol: TCP
            timeoutSeconds: 1
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /actuator/health
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /actuator/health
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1

Spring Boot 添加依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-mail</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

请求节点

发现耗时在17秒。

$ time curl -v http://localhost:9090/actuator/health
*   Trying 127.0.0.1:9090...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /actuator/health HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.84.0
> Accept: */*
>
  0     0    0     0    0     0      0      0 --:--:--  0:00:17 --:--:--     0* Mark bundle as not supporting multiuse
< HTTP/1.1 503
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Wed, 14 Sep 2022 03:14:24 GMT
< Connection: close
<
{ [372 bytes data]
100   365    0   365    0     0     20      0 --:--:--  0:00:17 --:--:--    82{"status":"DOWN","components":{"diskSpace":{"status":"UP","details":{"total":2000263573504,"free":1985804197888,"threshold":10485760,"exists":true}},"mail":{"status":"DOWN","details":{"location":"smtpdm.aliyun.com:465","error":"javax.mail.MessagingException: Got bad greeting from SMTP host: smtpdm.aliyun.com, port: 465, response: [EOF]"}},"ping":{"status":"UP"}}}
* Closing connection 0

real    0m17.852s
user    0m0.000s
sys     0m0.030s

健康检查返回信息

HTTP状态码为503,非200

返回信息中的status也为DOWN,非UP

{
    "status": "DOWN",
    "components": {
        "diskSpace": {
            "status": "UP",
            "details": {
                "total": 2000263573504,
                "free": 1985804197888,
                "threshold": 10485760,
                "exists": true
            }
        },
        "mail": {
            "status": "DOWN",
            "details": {
                "location": "smtpdm.aliyun.com:465",
                "error": "javax.mail.MessagingException: Got bad greeting from SMTP host: smtpdm.aliyun.com, port: 465, response: [EOF]"
            }
        },
        "ping": {
            "status": "UP"
        }
    }
}

后台日志

MailConnectException

com.sun.mail.util.MailConnectException: Couldn't connect to host, port: test.com, 123; timeout -1
    at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:2210) ~[jakarta.mail-1.6.7.jar:1.6.7]
    at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:722) ~[jakarta.mail-1.6.7.jar:1.6.7]
    at javax.mail.Service.connect(Service.java:342) ~[jakarta.mail-1.6.7.jar:1.6.7]
    at org.springframework.mail.javamail.JavaMailSenderImpl.connectTransport(JavaMailSenderImpl.java:518) ~[spring-context-support-5.3.22.jar:5.3.22]
    at org.springframework.mail.javamail.JavaMailSenderImpl.testConnection(JavaMailSenderImpl.java:398) ~[spring-context-support-5.3.22.jar:5.3.22]
    at org.springframework.boot.actuate.mail.MailHealthIndicator.doHealthCheck(MailHealthIndicator.java:42) ~[spring-boot-actuator-2.7.3.jar:2.7.3]
    at org.springframework.boot.actuate.health.AbstractHealthIndicator.health(AbstractHealthIndicator.java:82) ~[spring-boot-actuator-2.7.3.jar:2.7.3]
    at org.springframework.boot.actuate.health.HealthIndicator.getHealth(HealthIndicator.java:37) ~[spring-boot-actuator-2.7.3.jar:2.7.3]

解决方法

方法一

spring-boot-actuator不校验mail接口,则/health接口可以快速返回。

management:
  health:
    mail:
      enabled: true

方法二(推荐)

针对于存活性探针和可读性探针,actuator提供了接口。使用probes开启该接口。KubernetesDeployment也需要修改配置。

接口URL

存活性探针:/health/liveness

可读性探针:/health/readiness

management:
  endpoint:
    health:
      probes:
        enabled: true
Views: 1,347 · Posted: 2022-10-31

————        END        ————

Give me a Star, Thanks:)

https://github.com/fendoudebb/LiteNote

扫描下方二维码关注公众号和小程序↓↓↓

扫描下方二维码关注公众号和小程序↓↓↓


Today On History
Browsing Refresh