我们在 GAE 上运行节点服务器,由于某种原因,我们的服务器每天会离线几次(有时可能需要几分钟才能恢复在线)。
全天的请求都是相同的,也没有例外会导致重新启动。没有请求激增或任何可能导致这种情况的特殊请求。
发生时记录:
2020-04-18T23:48:51.881806Z [GET /v1/util/example [36m304 [35.262 ms - -[ A
2020-04-18T23:50:17.119906Z [start] 2020/04/18 23:50:17.119185 Quitting on terminated signal A
2020-04-18T23:50:17.175632Z [start] 2020/04/18 23:50:17.175267 Start program failed: user application failed with exit code -1 (refer to stdout/stderr logs for more detail): signal: terminated
2020-04-18T23:51:38.772388Z GET 304 173 B 3.3 s Example-V2/3.1.13 (com.example.app; build:1; iOS 13.4.0) Alamofire/5.1.0 /v1/util/example GET 304 173 B 3.3 s Example-V2/3.1.13 (com.example.app; build:1; iOS 13.4.0) Alamofire/5.1.0 5e9b928a00ff0bc9244f94194c0001737e737065616b2d76322d32613166310001737065616b2d6170693a323032303034303374303630343431000100
2020-04-18T23:51:38.786760Z GET 404 324 B 2.4 s Unknown /_ah/start GET 404 324 B 2.4 s Unknown 5e9b928a00ff0c014898f5c27f0001737e737065616b2d76322d32613166310001737065616b2d6170693a323032303034303374303630343431000100
2020-04-18T23:51:39.529080Z [start] 2020/04/18 23:51:39.511828 No entrypoint specified, using default entrypoint: /serve
2020-04-18T23:51:39.529642Z [start] 2020/04/18 23:51:39.528742 Starting app
2020-04-18T23:51:39.529968Z [start] 2020/04/18 23:51:39.529100 Executing: /bin/sh -c exec /serve
2020-04-18T23:51:39.590085Z [start] 2020/04/18 23:51:39.589751 Waiting for network connection open. Subject:"app/invalid" Address:127.0.0.1:8080
2020-04-18T23:51:39.590571Z [start] 2020/04/18 23:51:39.590347 Waiting for network connection open. Subject:"app/valid" Address:127.0.0.1:8081
2020-04-18T23:51:39.764383Z [serve] 2020/04/18 23:51:39.763656 Serve started.
2020-04-18T23:51:39.764935Z [serve] 2020/04/18 23:51:39.764544 Args: {runtimeName:nodejs10 memoryMB:1024 positional:[]}
2020-04-18T23:51:39.766562Z [serve] 2020/04/18 23:51:39.765904 Running /bin/sh -c exec node server.js
2020-04-18T23:51:41.072621Z [start] 2020/04/18 23:51:41.071895 Wait successful. Subject:"app/valid" Address:127.0.0.1:8081 Attempts:296 Elapsed:1.481194491s
2020-04-18T23:51:41.072978Z Express server started on port: 8081
2020-04-18T23:51:41.073008Z [start] 2020/04/18 23:51:41.072411 Starting nginx
2020-04-18T23:51:41.085901Z [start] 2020/04/18 23:51:41.085451 Waiting for network connection open. Subject:"nginx" Address:127.0.0.1:8080
2020-04-18T23:51:41.132064Z [start] 2020/04/18 23:51:41.131572 Wait successful. Subject:"nginx" Address:127.0.0.1:8080 Attempts:9 Elapsed:45.911234ms
2020-04-18T23:51:41.170786Z [GET /_ah/start [33m404 [11.865 ms - 61[
始终有超过 70% 的可用内存,因此这不可能是问题。仅在重新启动时才注意到 CPU 利用率非常高(比正常情况高 10 倍)。
In the bottom picture you can clearly see when the restarts happen:
![CPU utilization](https://i.stack.imgur.com/9uEG8.png)
这是我的app.yaml
runtime: nodejs10
instance_class: B4
service: example-api
basic_scaling:
max_instances: 1
idle_timeout: 30m
handlers:
- url: .*
secure: always
script: auto
这正在发生在我们的生产服务器,因此非常欢迎任何帮助。
Thanks!
读这篇文章document,有人提到,尽管他们尝试让基本和手动扩展实例无限期地运行,但有时会重新启动以进行维护,或者可能由于其他原因而失败。这就是为什么将最大实例数保持为 1 并不被认为是最佳实践,因为它很容易出现所有这些故障。正如另一个答案中提到的,我还建议增加实例的数量,这样更多失败或同时重新启动的可能性就会降低。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)