我有这样的转变:
JavaRDD<Tuple2<String, Long>> mappedRdd = myRDD.values().map(
new Function<Pageview, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> call(Pageview pageview) throws Exception {
String key = pageview.getUrl().toString();
Long value = getDay(pageview.getTimestamp());
return new Tuple2<>(key, value);
}
});
综合浏览量是一种:页面视图.java https://github.com/apache/gora/blob/master/gora-tutorial/src/main/java/org/apache/gora/tutorial/log/generated/Pageview.java
我将该类注册到 Spark 中,如下所示:
Class[] c = new Class[1];
c[0] = Pageview.class;
sparkConf.registerKryoClasses(c);
线程“main”org.apache.spark.SparkException 中出现异常:任务不
可序列化于
org.apache.spark.util.ClosureCleaner$.ensureSerialized(ClosureCleaner.scala:166)
在
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
在 org.apache.spark.SparkContext.clean(SparkContext.scala:1623) 处
org.apache.spark.rdd.RDD.map(RDD.scala:286) 在
org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:89)
在
org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46)
在
org.apache.gora.tutorial.log.ExampleSpark.run(ExampleSpark.java:100)
在
org.apache.gora.tutorial.log.ExampleSpark.main(ExampleSpark.java:53)
引起原因:java.io.NotSerializedException:
org.apache.gora.tutorial.log.ExampleSpark 序列化堆栈:
- 对象不可序列化(类:org.apache.gora.tutorial.log.ExampleSpark,值:
org.apache.gora.tutorial.log.ExampleSpark@1a2b4497)
- 字段(类:org.apache.gora.tutorial.log.ExampleSpark$1,名称:this$0,类型:类 org.apache.gora.tutorial.log.ExampleSpark)
- 对象(类 org.apache.gora.tutorial.log.ExampleSpark$1、org.apache.gora.tutorial.log.ExampleSpark$1@4ab2775d)
- 字段(类:org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1,
名称:fun$1,类型:接口
org.apache.spark.api.java.function.Function)
- 对象(类 org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1,
) 在
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)
在
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
在
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
在
org.apache.spark.util.ClosureCleaner$.ensureSerialized(ClosureCleaner.scala:164)
... 7 更多
当我调试代码时我看到JavaSerializer.scala
即使有一个名为的类也会被调用KryoSerializer
.
PS 1:我不想使用 Java Serializer 但实现Serializer
at Pageview
并不能解决问题。
PS 2:这并没有消除问题:
...
//String key = pageview.getUrl().toString();
//Long value = getDay(pageview.getTimestamp());
String key = "Dummy";
Long value = 1L;
return new Tuple2<>(key, value);
...