为了运行 Amplab 的训练练习,我创建了一个密钥对us-east-1
,已经安装了训练脚本(git clone git://github.com/amplab/training-scripts.git -b ampcamp4
)并创建了环境。变量 AWS_ACCESS_KEY_ID 和 AWS_SECRET_ACCESS_KEY 按照中的说明进行操作http://ampcamp.berkeley.edu/big-data-mini-course/launching-a-bdas-cluster-on-ec2.html
正在运行
./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1
生成以下消息:
johndoe@ip-some-instance:~/projects/spark/training-scripts$ ./spark-ec2 -i ~/.ssh/myspark.pem -r us-east-1 -k myspark --copy launch try1
Setting up security groups...
Searching for existing cluster try1...
Latest Spark AMI: ami-19474270
Launching instances...
Launched 5 slaves in us-east-1b, regid = r-0c5e5ee3
Launched master in us-east-1b, regid = r-316060de
Waiting for instances to start up...
Waiting 120 more seconds...
Copying SSH key /home/johndoe/.ssh/myspark.pem to master...
ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused
Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem [email protected] 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30
ssh: connect to host ec2-54-90-57-174.compute-1.amazonaws.com port 22: Connection refused
Error connecting to host Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem [email protected] 'mkdir -p ~/.ssh'' returned non-zero exit status 255, sleeping 30
...
...
subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no -i /home/johndoe/.ssh/myspark.pem [email protected] '/root/spark/bin/stop-all.sh'' returned non-zero exit status 127
where [email protected]
是用户和主实例。我试过了-u ec2-user
并不断增加-w
一直到600,但得到同样的错误。
我可以看到主实例和从实例us-east-1
当我登录到AWS控制台时,我实际上可以从“本地”ssh到Master实例ip-some-instance
shell.
我的理解是,spark-ec2 脚本负责定义主/从安全组(监听哪些端口等),我不必调整这些设置。这就是说,主人和奴隶都听22号帖子(Port:22, Protocol:tcp, Source:0.0.0.0/0
在 ampcamp3-slaves/masters 秒。组)。
我在这里不知所措,在我将所有研发资金花在 EC2 实例上之前,希望能得到任何指点……谢谢。