我们在单个 GKE(谷歌 kubernetes 引擎)集群节点中部署了一个软件系统,该节点使用大约 100 个 pod,在每个 pod 中我们定义了 TCP 就绪探针,现在我们可以看到就绪探针定期失败,并显示Unable to connect to remote host: Connection refused
在不同的 Pod 上。
通过集群节点和故障 pod 上的 tcpdump 跟踪,我们发现从集群节点发送的数据包似乎是正确的,而 pod 没有收到 TCP 数据包,但故障 pod 仍然可以接收 IP 广播数据包。
奇怪的是,如果我们从故障 pod 中 ping/curl/wget 集群节点,无论集群节点是否有 http 服务,TCP 连接都会立即恢复,并且就绪检查也会变得正常。
示例如下:
集群节点主机:10.44.0.1
失败的 Pod 主机:10.44.0.92
集群节点cbr0接口上的tcpdump:
#sudo tcpdump -i cbr0 host 10.44.0.92
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:33:52.913052 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:33:52.913181 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:33:57.727497 IP 10.44.0.1.47736 > 10.44.0.92.mysql: Flags [S], seq 756717730, win 28400, options [mss 1420,sackOK,TS val 1084890021 ecr 0,nop,wscale 7], length 0
17:33:57.727537 IP 10.44.0.92.mysql > 10.44.0.1.47736: Flags [R.], seq 0, ack 756717731, win 0, length 0
17:34:07.727563 IP 10.44.0.1.48202 > 10.44.0.92.mysql: Flags [S], seq 2235831098, win 28400, options [mss 1420,sackOK,TS val 1084900021 ecr 0,nop,wscale 7], length 0
17:34:07.727618 IP 10.44.0.92.mysql > 10.44.0.1.48202: Flags [R.], seq 0, ack 2235831099, win 0, length 0
17:34:12.881059 ARP, Request who-has 10.44.0.92 tell 10.44.0.1, length 28
17:34:12.881176 ARP, Reply 10.44.0.92 is-at 0a:58:0a:2c:00:5c (oui Unknown), length 28
这些是从 Kubelet 发送的准备情况检查数据包,我们可以看到故障节点响应Flags [R.], seq 0, ack 756717731, win 0, length 0
这是一个 TCP 握手 ACK/SYN 回复,它是一个失败的数据包,并且 TCP 连接将不会建立。
如果我们exec -it
到发生故障的 Pod 并从该 Pod 对集群节点执行 ping 操作,如下所示:
root@mariadb:/# ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: icmp_seq=0 ttl=64 time=3.301 ms
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=0.303 ms
然后让我们从 TCP 转储中查看集群节点端发生了什么:
#sudo tcpdump -i cbr0 host 10.44.0.92
17:34:17.728039 IP 10.44.0.92.mysql > 10.44.0.1.48704: Flags [R.], seq 0, ack 2086181490, win 0, length 0
17:34:27.727638 IP 10.44.0.1.49202 > 10.44.0.92.mysql: Flags [S], seq 1769056007, win 28400, options [mss 1420,sackOK,TS val 1084920022 ecr 0,nop,wscale 7], length 0
17:34:27.727693 IP 10.44.0.92.mysql > 10.44.0.1.49202: Flags [R.], seq 0, ack 1769056008, win 0, length 0
17:34:34.016995 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:34:34.018358 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:34:34.020020 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 0, length 64
17:34:34.020101 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 0, length 64
17:34:35.017197 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 1, length 64
17:34:35.017256 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 1, length 64
17:34:36.018589 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 2, length 64
17:34:36.018700 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 2, length 64
17:34:37.019791 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 3, length 64
17:34:37.019837 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 3, length 64
17:34:37.730849 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [S], seq 1304758051, win 28400, options [mss 1420,sackOK,TS val 1084930025 ecr 0,nop,wscale 7], length 0
17:34:37.730900 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [S.], seq 1267340310, ack 1304758052, win 28160, options [mss 1420,sackOK,TS val 3617117819 ecr 1084930025,nop,wscale 7], length 0
17:34:37.730952 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [.], ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731149 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [F.], seq 1, ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731268 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [P.], seq 1:107, ack 2, win 220, options [nop,nop,TS val 3617117819 ecr 1084930025], length 106
17:34:37.731322 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [R], seq 1304758053, win 0, length 0
17:34:47.728119 IP 10.44.0.1.50138 > 10.44.0.92.mysql: Flags [S], seq 502800802, win 28400, options [mss 1420,sackOK,TS val 1084940022 ecr 0,nop,wscale 7], length 0
17:34:47.728179 IP 10.44.0.92.mysql > 10.44.0.1.50138: Flags [S.], seq 4294752326, ack 502800803, win 28160, options [mss 1420,sackOK,TS val 3617127816 ecr 1084940022,nop,wscale 7], length 0
我们可以看到ICMP数据包是从pod发送的ping命令的数据包,在ICMP数据包之后,准备检查数据包现在立即变得正确,TCP句柄交换成功。
不仅 ping 可以让它工作,其他命令如curl/wget 也可以让它工作,只需要从故障的 pod 到达集群节点,之后集群节点到 pod 的 TCP 连接就会变得正确。
失败的 Pod 会时不时地发生变化,它可能发生在任何 Pod 上,因为节点上有 100 个 Pod 正在运行,不确定它是否会触发某些系统限制,但是所有其他 Pod 都正常工作,我们没有看到巨大的 CPU 利用率,并且节点上仍然剩余几 GB 内存。
有谁知道问题可能是什么?