我正在尝试在多节点集群上使用 install openMPI 和 mpich2,但在这两种情况下,我在多台计算机上运行时都遇到问题。使用 mpich2,我可以从头节点在特定主机上运行,但是如果我尝试从计算节点到不同节点运行某些内容,我会得到:
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0@destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
如果我尝试使用 sge 来设置作业,我会收到类似的错误。
另一方面,如果我尝试使用 openMPI 运行作业,我将无法在任何远程计算机上运行,即使是从头节点也是如此。我得到:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
这些机器相互连接,我可以从其中任何一台机器到任何其他机器进行 ping、ssh 无密码等操作,MPI_LIB 和 PATH 在所有机器中都设置得很好。
通常这是因为您没有设置主机文件或在命令行上传递主机列表而导致的。
对于 MPICH,您可以通过传递标志来做到这一点-host
在命令行上,后跟主机列表(host1
,host2
,host3
,etc.).
mpiexec -host host1,host2,host3 -n 3 <executable>
您还可以将它们放入文件中:
host1
host2
host3
然后您在命令行上传递该文件,如下所示:
mpiexec -f <hostfile> -n 3 <executable>
同样,对于 Open MPI,您可以使用:
mpiexec --host host1,host2,host3 -n 3 <executable>
and
mpiexec --hostfile hostfile -n 3 <executable>
您可以通过以下链接获取更多信息:
- MPICH - https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks
- 打开 MPI -http://www.open-mpi.org/faq/?category=running#mpirun-hostfile http://www.open-mpi.org/faq/?category=running#mpirun-hostfile
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)