当我尝试应用方法(ComputeDwt)时,我遇到了上述异常RDD[(Int,ArrayBuffer[(Int,Double)])]
输入。
我什至正在使用extends Serialization
在 Spark 中序列化对象的选项。这是代码片段。
input:series:RDD[(Int,ArrayBuffer[(Int,Double)])]
DWTsample extends Serialization is a class having computeDwt function.
sc: sparkContext
val kk:RDD[(Int,List[Double])]=series.map(t=>(t._1,new DWTsample().computeDwt(sc,t._2)))
Error:
org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext
org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:556)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:503)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:361)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
谁能告诉我可能是什么问题以及应该采取什么措施来克服这个问题?
The line
series.map(t=>(t._1,new DWTsample().computeDwt(sc,t._2)))
引用 SparkContext (sc
)但 SparkContext 不可序列化。 SparkContext 旨在公开在驱动程序上运行的操作;它不能被在工作线程上运行的代码引用/使用。
你必须重新构建你的代码,以便sc
未在您的地图函数闭包中引用。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)