异常信息:
22/01/14 13:58:44 [Reporter] INFO YarnAllocator: Completed container container_e118_5690061100801_24379300_01_000066 on host: BJLFRZ-Evil-153-70.hadoop.jd.local (state: COMPLETE, exit status: 7)
22/01/14 13:58:44 [Reporter] WARN YarnAllocator: Container from a bad node: container_e118_5690061100801_24379300_01_000066 on host: BJLFRZ-Evil-153-70.hadoop.jd.local. Exit status: 7. Diagnostics: Exception from container-launch.
Container id: container_e118_5690061100801_24379300_01_000066
Exit code: 7
Exception message: docker: Cannot connect to the Docker daemon. Is the docker daemon running on this host?.
See '/usr/bin/docker-current run --help'.
Could not invoke docker docker run --name=container_e118_5690061100801_24379300_01_000066 --user=$(id -u yarn) -d --workdir=/data3/yarn1/local/usercache/jdw_dwm_bqyf/appcache/application_5690061100801_24379300/container_e118_5690061100801_24379300_01_000066 --net=host -v /etc/passwd:/etc/passwd:ro -v /etc/shadow:/etc/shadow:ro -v /etc/group:/etc/group:ro -v /software/servers/jdk1.8.0_121:/software/servers/jdk1.8.0_121:ro -v /software/servers/yarn-2.7.1:/software/servers/yarn-2.7.1:ro -v /var/lib/hadoop-hdfs/dn_socket:/var/lib/hadoop-hdfs/dn_socket:ro -v /data0/yarn1/local:/data0/yarn1/local -v /data1/yarn1/local:/data1/yarn1/local -v /data2/yarn1/local:/data2/yarn1/local -v /data3/yarn1/local:/data3/yarn1/local -v /data4/yarn1/local:/data4/yarn1/local -v /data5/yarn1/local:/data5/yarn1/local -v /data6/yarn1/local:/data6/yarn1/local -v /data7/yarn1/local:/data7/yarn1/local -v /data8/yarn1/local:/data8/yarn1/local -v /data9/yarn1/local:/data9/yarn1/local -v /data10/yarn1/local:/data10/yarn1/local -v /data11/yarn1/local:/data11/yarn1/local -v /data3/yarn1/local/usercache/jdw_dwm_bqyf/appcache/application_5690061100801_24379300/container_e118_5690061100801_24379300_01_000066:/data3/yarn1/local/usercache/jdw_dwm_bqyf/appcache/application_5690061100801_24379300/container_e118_5690061100801_24379300_01_000066 -v /data0/yarn1/logs:/data0/yarn1/logs -v /data1/yarn1/logs:/data1/yarn1/logs -v /data2/yarn1/logs:/data2/yarn1/logs -v /data3/yarn1/logs:/data3/yarn1/logs -v /data4/yarn1/logs:/data4/yarn1/logs -v /data5/yarn1/logs:/data5/yarn1/logs -v /data6/yarn1/logs:/data6/yarn1/logs -v /data7/yarn1/logs:/data7/yarn1/logs -v /data8/yarn1/logs:/data8/yarn1/logs -v /data9/yarn1/logs:/data9/yarn1/logs -v /data10/yarn1/logs:/data10/yarn1/logs -v /data11/yarn1/logs:/data11/yarn1/logs --cgroup-parent=/yarn/container/container_e118_5690061100801_24379300_01_000066 bdp-docker.jd.com:5000/wise_mart_bag:latest bash /data3/yarn1/local/usercache/jdw_dwm_bqyf/appcache/application_5690061100801_24379300/container_e118_5690061100801_24379300_01_000066/launch_container.sh.
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:297)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:102)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:389)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:85)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 4
main : run as user is yarn
main : requested yarn user is jdw_dwm_bqyf
Creating script paths...
Creating local dirs...
Getting exit code file...
Changing effective user to root...
Launching docker container...
解决方案:
日志描述的是BJLFRZ-Evil-153-70.hadoop.jd.local节点的docker daemon进程未启动,所以无法拉起docker容器,一般情况下需要联系运维处理。