Spark-submit 导入 SparkContext 失败

2023-12-19

我在本地 Mac 笔记本电脑上运行 Spark 1.4.1 并且能够使用pyspark交互方式没有任何问题。 Spark 是通过 Homebrew 安装的,我使用的是 Anaconda Python。但是,一旦我尝试使用spark-submit,我收到以下错误:

15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:test.py does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)
15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error.
java.lang.NullPointerException
    at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152)
    at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216)
    at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1659)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:565)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):
  File "test.py", line 35, in <module> sc = SparkContext("local","test") 
  File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__
  File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_init
  File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_context
  File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
  File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:test.py does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:207)
    at java.lang.Thread.run(Thread.java:745)

这是我的代码:

from pyspark import SparkContext

if __name__ == "__main__":
    sc = SparkContext("local","test")
    sc.parallelize([1,2,3,4])
    sc.stop()

如果我将文件移动到/usr/local/Cellar/apache-spark/1.4.1/目录,然后spark-submit工作正常。我的环境变量设置如下:

export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1"
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip

我确信我的环境中有些东西设置不正确,但我似乎无法找到它。


执行的 python 文件spark-submit应该在PYTHONPATH。通过执行以下操作添加目录的完整路径:

export PYTHONPATH=full/path/to/dir:$PYTHONPATH

或者你也可以添加'.' to the PYTHONPATH如果您已经位于 python 脚本所在的目录中

export PYTHONPATH='.':$PYTHONPATH

感谢@Def_Os 指出了这一点!

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Spark-submit 导入 SparkContext 失败 的相关文章

随机推荐