Spark 无法再执行作业。执行器创建目录失败

2024-04-09

我们已经有一个小型 Spark 集群运行了一个月,它已经成功执行了作业,或者让我为该集群启动一个 Spark-shell。

无论我向集群提交作业还是使用 shell 连接到集群,错误总是相同的。

    root@~]$ $SPARK_HOME/bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
14/11/10 20:43:01 INFO spark.SecurityManager: Changing view acls to: root,
14/11/10 20:43:01 INFO spark.SecurityManager: Changing modify acls to: root,
14/11/10 20:43:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
14/11/10 20:43:01 INFO spark.HttpServer: Starting HTTP Server
14/11/10 20:43:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/10 20:43:01 INFO server.AbstractConnector: Started [email protected] /cdn-cgi/l/email-protection:60223
14/11/10 20:43:01 INFO util.Utils: Successfully started service 'HTTP class server' on port 60223.
Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
14/11/10 20:43:05 INFO spark.SecurityManager: Changing view acls to: root,
14/11/10 20:43:05 INFO spark.SecurityManager: Changing modify acls to: root,
14/11/10 20:43:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
14/11/10 20:43:05 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/11/10 20:43:05 INFO Remoting: Starting remoting
14/11/10 20:43:05 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected] /cdn-cgi/l/email-protection:41369]
14/11/10 20:43:05 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected] /cdn-cgi/l/email-protection:41369]
14/11/10 20:43:05 INFO util.Utils: Successfully started service 'sparkDriver' on port 41369.
14/11/10 20:43:05 INFO spark.SparkEnv: Registering MapOutputTracker
14/11/10 20:43:05 INFO spark.SparkEnv: Registering BlockManagerMaster
14/11/10 20:43:05 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-local-20141110204305-a4f0
14/11/10 20:43:05 INFO storage.DiskBlockManager: Created local directory at /mnt2/spark/spark-local-20141110204305-991c
14/11/10 20:43:05 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 56708.
14/11/10 20:43:05 INFO network.ConnectionManager: Bound socket to port 56708 with id = ConnectionManagerId(ip-10-237-182-163.ec2.internal,56708)
14/11/10 20:43:05 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB
14/11/10 20:43:05 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/11/10 20:43:05 INFO storage.BlockManagerMasterActor: Registering block manager ip-10-237-182-163.ec2.internal:56708 with 265.4 MB RAM
14/11/10 20:43:05 INFO storage.BlockManagerMaster: Registered BlockManager
14/11/10 20:43:05 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-fa8cd9e8-5a4a-40a4-bc76-c2215886873e
14/11/10 20:43:05 INFO spark.HttpServer: Starting HTTP Server
14/11/10 20:43:05 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/10 20:43:05 INFO server.AbstractConnector: Started [email protected] /cdn-cgi/l/email-protection:36394
14/11/10 20:43:05 INFO util.Utils: Successfully started service 'HTTP file server' on port 36394.
14/11/10 20:43:06 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/11/10 20:43:06 INFO server.AbstractConnector: Started [email protected] /cdn-cgi/l/email-protection:4040
14/11/10 20:43:06 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
14/11/10 20:43:06 INFO ui.SparkUI: Started SparkUI at http://ec2-54-91-220-90.compute-1.amazonaws.com:4040
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Connecting to master spark://ec2-54-91-220-90.compute-1.amazonaws.com:7077...
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
14/11/10 20:43:06 INFO repl.SparkILoop: Created spark context..
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141110204306-0389
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/0 on worker-20140929210658-ip-10-225-160-49.ec2.internal-60693 (ip-10-225-160-49.ec2.internal:60693) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/0 on hostPort ip-10-225-160-49.ec2.internal:60693 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/1 on worker-20140929210658-ip-10-147-28-32.ec2.internal-60731 (ip-10-147-28-32.ec2.internal:60731) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/1 on hostPort ip-10-147-28-32.ec2.internal:60731 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/2 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/2 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/2 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/1 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/2 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/2)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/2 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/2
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/3 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/3 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/0 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/3 is now RUNNING
Spark context available as sc.
scala> 14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/3 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/3)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/3 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/3
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/4 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/4 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/4 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/4 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/4)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/4 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/4
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/5 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/5 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/5 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/5 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/5)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/5 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/5
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/6 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/6 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/6 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/6 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/6)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/6 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/6
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/7 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/7 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/7 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/7 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/7)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/7 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/7
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/8 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/8 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/8 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/8 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/8)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/8 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/8
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/9 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/9 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/9 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/9 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/9)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/9 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/9
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/10 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/10 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/10 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/10 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/10)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/10 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/10
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor added: app-20141110204306-0389/11 on worker-20140929210657-ip-10-69-165-231.ec2.internal-47794 (ip-10-69-165-231.ec2.internal:47794) with 4 cores
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141110204306-0389/11 on hostPort ip-10-69-165-231.ec2.internal:47794 with 4 cores, 12.4 GB RAM
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/11 is now RUNNING
14/11/10 20:43:06 INFO client.AppClient$ClientActor: Executor updated: app-20141110204306-0389/11 is now FAILED (java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/11)
14/11/10 20:43:06 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141110204306-0389/11 removed: java.io.IOException: Failed to create directory /root/spark/work/app-20141110204306-0389/11
14/11/10 20:43:06 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
14/11/10 20:43:06 ERROR scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED`

但是,如果我只在本地模式下运行它,它就会连接并运行良好。

我倾向于这是某种许可错误,但我们在它工作的整个过程中没有触及任何东西。

EDIT

经过一番挖掘后,我发现工作节点的磁盘空间不足。在工作文件夹中,事实证明它存储了复制的 jar 以及该作业的 stdout 和 stderr 文件。无论如何,有没有办法让它在完成后删除这些内容,因为我们已经为要发送到 S3 的作业进行了日志记录设置。


问题是工作节点磁盘空间不足,这是由于存储传输 jar 的数据以及存储 stdout 和 stderr 文件而导致的。

我在以下位置找到了此信息http://spark.apache.org/docs/1.1.0/submitting-applications.html http://spark.apache.org/docs/1.1.0/submitting-applications.html

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Spark 无法再执行作业。执行器创建目录失败 的相关文章

随机推荐

  • SORL 方面字段按降序值排序

    我正在使用带有分面过滤器的 SOLR 6 5 1 我的查询有 facet limit 1 gt 生成所有可能的构面值 facet sort index gt 不按出现次数而是按值本身对分面值进行排序 例如 一个方面将整数作为值 特别是字段包
  • 使用 django-rest-framework-simplejwt 注册后返回令牌

    我正在使用 django rest framework simplejwt 想知道注册用户后是否可以返回令牌 This https stackoverflow com questions 37622616 django rest frame
  • paramiko 中的超时(python)

    我正在寻找一种为此设置超时的方法 transport paramiko Transport host port transport connect username username password password sftp param
  • 不同数据库的表之间的关系

    是否可以在 SQL Server 2008 中定义不同数据库中的表之间的关系 您能推荐一个在线教程来学习它吗 我更喜欢ASP NET C 不 数据库之间不能有外键 数据完整性仅存在于单个数据库内 如果您需要跨数据库的事务一致性 那么您应该使
  • 如何在javascript中将字符串值转换为变量?

    var test1 document ready function test1 test1ID jQueryPlugin var test2 document ready function test2 test2ID jQueryPlugi
  • CORS 和 example.com

    我在 CORS 方面遇到了麻烦 我使用的 API 有 Access Control Allow Origin http www example com 因此 我无法访问继续访问我的网站所需的信息 但是 奇怪的是 如果我将 API url 放
  • 为什么使用 Nuxt v3 静态生成时 会输出两次后面的内容?

    我在用vue fontawesome https github com FortAwesome vue fontawesome与 Nuxt 3 一起使用 如上所述here https fontawesome com docs web use
  • Reactjs - 从应用程序中的任何位置获取组件

    我想知道是否有一种方法可以通过使用某种类型的 id 或类型来获取组件 就像在 DOM 操作中所做的那样 就像是 var Avatar React createClass render function React renderCompone
  • Rythm 模板引擎和 GAE

    有人设法让 Rythm 模板引擎与 Google Appengine 一起使用吗 Thanks 现在的节奏 版本1 0 0 20120505 http rythmengine com public rythm 1 0 0 20120505
  • 如何查询SOLR中的空字段?

    我有一个很大的 solr 索引 我注意到一些字段没有正确更新 索引是动态的 这导致某些字段具有空的 id 字段 我已经尝试过这些查询 但它们不起作用 id id NULL id null id id id TO 有没有办法查询空字段 Tha
  • 按字母顺序获取地图中的键的简单方法

    在 Go 中 让映射中的键按字母顺序排序的最简单方法是什么 这是我能做到的最短方法 package main import container vector import fmt import sort func main m map st
  • 可以放在桌子的底部吗?

    我想用一个 tfoot 表中的标签在语义上是正确的 但它一直显示在我的表的顶部 有没有办法让它显示在底部 正如其他人所说 tfoot是在之前定义的tbody但之后渲染 这是设计使然 不会改变语义 桌子有头 脚和身体 这些的顺序并不重要 Th
  • simple_form 提交的默认disable_with

    我想更改 simple form 中提交按钮的默认行为 这样我就不需要为所有表单显式指定 disable with gt true 如何在 simple form rb 中进行此特定更改 这在较新版本的 Rails 中略有不同 因为不推荐设
  • 纯js按属性名称开头选择元素

    语境 HTML div div div div p p 我想选择属性名称以以下开头的所有元素ng 使用 jQuery 以下链接是最接近此问题的线程 jQuery 如何按属性名称开头选择值 https stackoverflow com qu
  • java.net.UnknownHostException:www.google.com

    我正在开发一个健全性检查网络应用程序 我尝试使用 HttpUrlConnection 方法获取 url 响应 但收到 UnknownHostException System setProperty java net preferIPv4St
  • 使用 Kerberos、Web API 和 MVC 的示例项目

    是否有完整的 Visual Studio 解决方案 其中包含具有使用 Kerberos 进行身份验证的 MVC 应用程序的项目 这反过来又调用 ASP Net Web API 服务项目 在同一解决方案中 在服务调用期间将凭据委托给服务 Ge
  • UILabel - 已弃用方法“adjustsLetterSpacingToFitWidth”的替代方法

    因此 在我的代码中 我正在检查我的字符是否适合我的标签 并有以下行 return self adjustsLetterSpacingToFitWidth 这被放置在一个实现中UILabel 有人能告诉我确切的替代方案是什么吗 文档说 使用N
  • 具有 Linq-to-SQL 的 ADO.NET 数据服务

    在将 linq to sql 与 ado net 数据服务结合使用时 我遇到了一个奇怪的错误 我有一个连接到远程数据库的简单 silverlight 应用程序 我首先添加了 linq to sql 类并将一个表拖到设计器上 然后我添加了 A
  • 从 Pyspark 中包含时间戳的字符串列中提取日期

    我有一个数据框 其日期格式如下 date May 6 2016 5 59 34 AM 我打算以以下格式从中提取日期YYYY MM DD 所以结果应该是上述日期 2016 05 06 但是当我提取时使用以下内容 df withColumn p
  • Spark 无法再执行作业。执行器创建目录失败

    我们已经有一个小型 Spark 集群运行了一个月 它已经成功执行了作业 或者让我为该集群启动一个 Spark shell 无论我向集群提交作业还是使用 shell 连接到集群 错误总是相同的 root SPARK HOME bin spar