使用Spark，如何连接master或解决错误：“WARN TaskSchedulerImpl：初始作业尚未接受任何资源”

2024-04-08

请告诉我如何解决以下问题。

首先，我确认以下代码在 master 为“本地”时运行。

然后我启动了两个 EC2 实例（m1.large）。但是，当 master 为“spark://MASTER_PUBLIC_DNS:7077”时，会出现错误消息“TaskSchedulerImpl”，并且失败。

当我从有效地址更改为无效地址作为主站（spark://INVALID_DNS:7077）时，会出现相同的错误消息。

即，“WARN TaskSchedulerImpl：初始作业未接受任何资源；检查您的集群 UI 以确保工作线程已注册并拥有足够的内存”

这好像是this http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-td8247.html。正如此评论，我为该集群分配了 12G 内存，但失败了。

#!/usr/bin/env python                                                                                     
# -*- coding: utf-8 -*- 
from pyspark import SparkContext, SparkConf 
from pyspark.mllib.classification import LogisticRegressionWithSGD 
from pyspark.mllib.regression import LabeledPoint 
from numpy import array 

# Load and parse the data 
def parsePoint(line): 
  values = [float(x) for x in line.split(' ')] 
  return LabeledPoint(values[0], values[1:]) 
appName = "testsparkapp" 
master = "spark://MASTER_PUBLIC_DNS:7077" 
#master = "local" 


conf = SparkConf().setAppName(appName).setMaster(master) 
sc = SparkContext(conf=conf) 

data = sc.textFile("/root/spark/mllib/data/sample_svm_data.txt") 
parsedData = data.map(parsePoint) 

# Build the model 
model = LogisticRegressionWithSGD.train(parsedData) 

# Evaluating the model on training data 
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features))) 
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count()) 
print("Training Error = " + str(trainErr))

额外的

我做了朋友建议我做的三项任务。

1.我打开了master端口，7077。

2.在master url中，设置主机名而不是IP地址。

->因此，我能够连接主服务器（我通过 Cluster UI 检查了它）。

3.我尝试设置worker_max_heap，如下所示，但可能会失败。

ScalaConf().set("spark.executor.memory", "4g").set("worker_max_heapsize","2g")

工作人员允许我使用6.3GB（我通过UI检查过）。它是m1.large。

->我在执行日志中发现了警告，在工作线程 stderr 中发现了错误。

我的执行日志

14/08/08 06:11:59 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

工人标准错误

14/08/08 06:14:04 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@PRIVATE_HOST_NAME1:52011/user/Worker
14/08/08 06:15:07 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@PRIVATE_HOST_NAME1:52201] -> [akka.tcp://spark@PRIVATE_HOST_NAME2:38286] disassociated! Shutting down.

Spark-ec2 脚本将 EC2 中的 Spark 集群配置为独立集群，这意味着它无法与远程提交一起使用。在发现它不受支持之前，我已经为您描述的相同错误苦苦挣扎了好几天。不幸的是，该消息错误不正确。

所以你必须复制你的东西并登录到 master 来执行你的 Spark 任务。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)