对我来说,这个问题是 macOS 特有的。我在 Google 上搜索了很多,发现 macOS 10.15 (Catalina) 上有许多损坏的 SSH 案例,但没有一个解决方法对我有用。最终我不得不查看 OpenSSH 代码并找出问题所在。
在源文件中sshconnect.c https://github.com/openssh/openssh-portable/blob/master/sshconnect.c:
194 static int
195 ssh_proxy_connect(struct ssh *ssh, const char *host, const char *host_arg,
196 u_short port, const char *proxy_command)
197 {
...
...
201 char *shell;
202
203 if ((shell = getenv("SHELL")) == NULL || *shell == '\0')
204 shell = _PATH_BSHELL;
...
...
211 command_string = expand_proxy_command(proxy_command, options.user,
212 host, host_arg, port);
213 debug("Executing proxy command: %.500s", command_string);
214
215 /* Fork and execute the proxy command. */
216 if ((pid = fork()) == 0) {
217 char *argv[10];
...
...
240 argv[0] = shell;
241 argv[1] = "-c";
242 argv[2] = command_string;
243 argv[3] = NULL;
244
245 /* Execute the proxy command. Note that we gave up any
246 extra privileges above. */
247 ssh_signal(SIGPIPE, SIG_DFL);
248 execv(argv[0], argv);
249 perror(argv[0]);
250 exit(1);
251 }
See line 203, 240 and 248, ssh is trying to run the ProxyCommand with $SHELL
(I found no doc for this) and it's using execv() https://man7.org/linux/man-pages/man3/exec.3.html which would not search in $PATH
. Then I checked my $SHELL
:
$ echo $SHELL
bash
So that's the problem. $SHELL
is not a full pathname executable so execv()
failed to execute it and the error bash: No such file or directory
is from perror() https://man7.org/linux/man-pages/man3/perror.3.html in line 249. (The error confused me a lot. The prefix bash:
made me think the error is from Bash.)
SOLUTION: Manually set SHELL
to the shell's full pathname, e.g. /bin/bash
. (I did not write shell /bin/bash
in .screenrc
because I also has /usr/local/bin/bash
.)
那谁定SHELL=bash
?为什么没有设置SHELL=/bin/bash
?
In my ~/.screenrc
I have:
shell bash
根据画面manual https://man7.org/linux/man-pages/man1/screen.1.html:
The SHELL
var 最初是/bin/bash
在我启动 screen 之前在我的交互式 shell 中,所以是 screen 进行设置SHELL=bash
。我认为 screen 应该找出 shell 的完整路径名并设置SHELL
到完整路径名,因为,根据posix https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html:
该变量应代表一个pathname用户首选的命令语言解释器。
那么为什么它在我的 Linux 系统(Debian)上运行得很好呢?SHELL=bash
也是(也在屏幕上)?
I did a strace https://man7.org/linux/man-pages/man1/strace.1.html并得到这个:
$ SHELL=xxx strace -f ssh [email protected] /cdn-cgi/l/email-protection
[...]
[pid 5767] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
[pid 5767] execve("/root/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] execve("/usr/local/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] execve("/usr/local/sbin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] execve("/usr/sbin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] execve("/usr/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] execve("/sbin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] execve("/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 5767] dup(2) = 3
[pid 5767] fcntl(3, F_GETFL) = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE)
[pid 5767] fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x21), ...}) = 0
[pid 5767] write(3, "xxx: No such file or directory\n", 31xxx: No such file or directory
) = 31
[pid 5767] close(3) = 0
[...]
As we can see, it's actually searching xxx
in $PATH
. Why? I guess Debian must have patched openssh and changed its behavior. (I would have verified this if I know Debian build internals. :-)
更新2020-11-19:
我手动编译了 OpenSSH (v8.4)source https://github.com/openssh/openssh-portable并在 Debian 上重现了同样的问题。这证实 Debian 已经修补了 OpenSSH 并改变了其行为。
$ /usr/local/openssh-8.4/bin/ssh [email protected] /cdn-cgi/l/email-protection
bash: No such file or directory
kex_exchange_identification: Connection closed by remote host
$ strace -f /usr/local/openssh-8.4/bin/ssh [email protected] /cdn-cgi/l/email-protection
[...]
[pid 21020] rt_sigaction(SIGPIPE, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f19a05a9840}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
[pid 21020] execve("bash", ["bash", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x5566982872f0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 21020] dup(2) = 3
[pid 21020] fcntl(3, F_GETFL) = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE)
[pid 21020] fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x25), ...}) = 0
[pid 21020] write(3, "bash: No such file or directory\n", 32bash: No such file or directory
) = 32
[pid 21020] close(3)
[...]