我们的 K8 集群工作了一年多,最近它出现了一些奇怪的行为,现在当我们使用kubectl apply -f deployment-manifest.yaml
,它没有显示在kubectl get pods
。但显示在kubectl get deployments
with 0/3
state. kubectl describe deployment app-deployment
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
Progressing False ProgressDeadlineExceeded
当我检查时kube-apiserver
logs
I1115 12:55:56.110277 1 trace.go:116] Trace[16922026]: "Call validating webhook" configuration:istiod-istio-system,webhook:validation.istio.io,resource:networking.istio.io/v1alpha3, Resource=gateways,subresource:,operation:CREATE,UID:00c425da-6475-4ed3-bc25-5a81d866baf2 (started: 2021-11-15 12:55:26.109897413 +0000 UTC m=+8229.935658158) (total time: 30.00030708s):
Trace[16922026]: [30.00030708s] [30.00030708s] END
W1115 12:55:56.110327 1 dispatcher.go:128] Failed calling webhook, failing open validation.istio.io: failed calling webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: dial tcp 10.233.30.109:443: i/o timeout
E1115 12:55:56.110363 1 dispatcher.go:129] failed calling webhook "validation.istio.io": Post https://istiod.istio-system.svc:443/validate?timeout=30s: dial tcp 10.233.30.109:443: i/o timeout
I1115 12:55:56.121271 1 trace.go:116] Trace[576910507]: "Create" url:/apis/networking.istio.io/v1alpha3/namespaces/istio-system/gateways,user-agent:pilot-discovery/v0.0.0 (linux/amd64) kubernetes/$Format,client:192.168.1.16 (started: 2021-11-15 12:55:26.108861126 +0000 UTC m=+8229.934621868) (total time: 30.012357263s):
Kube 控制器日志
I1116 07:55:06.218995 1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"ops-executor-app-6647b7cbdb", UID:"0ef5fefd-88d7-480f-8a5d-f7e2c8025ae9", APIVersion:"apps/v1", ResourceVersion:"122334057", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: Internal error occurred: failed calling webhook "sidecar-injector.istio.io": Post https://istiod.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E1116 07:56:12.342407 1 replica_set.go:535] sync "default/app-6769f4cb97" failed with Internal error occurred: failed calling webhook "sidecar-injector.istio.io": Post https://istiod.istio-system.svc:443/inject?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
当我检查时kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
istio-egressgateway-794d6f956b-8p5vz 0/1 Running 5 401d
istio-ingressgateway-784f857457-2fz4v 0/1 Running 5 401d
istiod-67c86464b4-vjp4j 1/1 Running 5 401d
出口和入口网关日志有
2021-11-15T16:55:31.419880Z error citadelclient Failed to create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.419912Z error cache resource:default request:37d26b55-df29-465f-9069-9b9a1904e8ab CSR retrial timed out: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.419956Z error cache resource:default failed to generate secret for proxy: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.419981Z error sds resource:default Close connection. Failed to get secret for proxy "router~10.233.70.87~istio-egressgateway-794d6f956b-8p5vz.istio-system~istio-system.svc.cluster.local" from secret cache: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:31.420070Z info sds resource:default connection is terminated: rpc error: code = Canceled desc = context canceled
2021-11-15T16:55:31.420336Z warning envoy config StreamSecrets gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.25.10:53: no such host"
2021-11-15T16:55:48.020242Z warning envoy config StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream
2021-11-15T16:55:48.020479Z warning envoy config Unable to establish new stream
2021-11-15T16:55:51.025327Z info sds resource:default new connection
2021-11-15T16:55:51.025597Z info sds Skipping waiting for gateway secret
尝试获取描述的详细信息here http://2021-11-15T16:55:31.419880Z%09error%09citadelclient%09Failed%20to%20create%20certificate:%20rpc%20error:%20code%20=%20Unavailable%20desc%20=%20connection%20error:%20desc%20=%20%22transport:%20Error%20while%20dialing%20dial%20tcp:%20lookup%20istiod.istio-system.svc%20on%20169.254.25.10:53:%20no%20such%20host%22%202021-11-15T16:55:31.419912Z%09error%09cache%09resource:default%20request:37d26b55-df29-465f-9069-9b9a1904e8ab%20CSR%20retrial%20timed%20out:%20rpc%20error:%20code%20=%20Unavailable%20desc%20=%20connection%20error:%20desc%20=%20%22transport:%20Error%20while%20dialing%20dial%20tcp:%20lookup%20istiod.istio-system.svc%20on%20169.254.25.10:53:%20no%20such%20host%22%202021-11-15T16:55:31.419956Z%09error%09cache%09resource:default%20failed%20to%20generate%20secret%20for%20proxy:%20rpc%20error:%20code%20=%20Unavailable%20desc%20=%20connection%20error:%20desc%20=%20%22transport:%20Error%20while%20dialing%20dial%20tcp:%20lookup%20istiod.istio-system.svc%20on%20169.254.25.10:53:%20no%20such%20host%22%202021-11-15T16:55:31.419981Z%09error%09sds%09resource:default%20Close%20connection.%20Failed%20to%20get%20secret%20for%20proxy%20%22router%7E10.233.70.87%7Eistio-egressgateway-794d6f956b-8p5vz.istio-system%7Eistio-system.svc.cluster.local%22%20from%20secret%20cache:%20rpc%20error:%20code%20=%20Unavailable%20desc%20=%20connection%20error:%20desc%20=%20%22transport:%20Error%20while%20dialing%20dial%20tcp:%20lookup%20istiod.istio-system.svc%20on%20169.254.25.10:53:%20no%20such%20host%22%202021-11-15T16:55:31.420070Z%09info%09sds%09resource:default%20connection%20is%20terminated:%20rpc%20error:%20code%20=%20Canceled%20desc%20=%20context%20canceled%202021-11-15T16:55:31.420336Z%09warning%09envoy%20config%09StreamSecrets%20gRPC%20config%20stream%20closed:%2014,%20connection%20error:%20desc%20=%20%22transport:%20Error%20while%20dialing%20dial%20tcp:%20lookup%20istiod.istio-system.svc%20on%20169.254.25.10:53:%20no%20such%20host%22%202021-11-15T16:55:48.020242Z%09warning%09envoy%20config%09StreamAggregatedResources%20gRPC%20config%20stream%20closed:%2014,%20no%20healthy%20upstream%202021-11-15T16:55:48.020479Z%09warning%09envoy%20config%09Unable%20to%20establish%20new%20stream%202021-11-15T16:55:51.025327Z%09info%09sds%09resource:default%20new%20connection%202021-11-15T16:55:51.025597Z%09info%09sds%09Skipping%20waiting%20for%20gateway%20secret,但它显示没有资源。
尝试在非 istio 注入的命名空间中部署应用程序,它可以正常工作,没有任何问题。
我们有运行 Ubuntu-18.04LTS 的裸机集群。
istioctl version
client version: 1.7.0
control plane version: 1.7.0
data plane version: none
库伯内斯v1.18.8
如上所述here http://istio-proxy@istio-ingressgateway-784f857457-2fz4v:/%24%20curl%20https://istiod.istio-system:443/inject%20-k%20%20curl:%20(6)%20Could%20not%20resolve%20host:%20istiod.istio-system, ran kubectl get --raw /api/v1/namespaces/istio-system/services/https:istiod:https-webhook/proxy/inject -v4
I1116 17:05:32.703339 28777 helpers.go:216] server response object: [{
"metadata": {},
"status": "Failure",
"message": "the server rejected our request for an unknown reason",
"reason": "BadRequest",
"details": {
"causes": [
{
"reason": "UnexpectedServerResponse",
"message": "no body found"
}
]
},
"code": 400
}]
F1116 17:05:32.703515 28777 helpers.go:115] Error from server (BadRequest): the server rejected our request for an unknown reason
来自 ingres 网关
istio-proxy@istio-ingressgateway-784f857457-2fz4v:/$ curl https://istiod.istio-system:443/inject -k
curl: (6) Could not resolve host: istiod.istio-system
编辑:在主节点中/var/lib/kubelet/config.yaml
clusterDNS:
- 169.254.25.10
我们可以ping
从我们的节点到这个IP。
我发现这个在coredns
Pod 日志
E1123 08:57:05.386992 1 reflector.go:153] pkg/mod/k8s.io/[email protected] /cdn-cgi/l/email-protection/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E1123 08:57:05.387108 1 reflector.go:153] pkg/mod/k8s.io/[email protected] /cdn-cgi/l/email-protection/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused