无法打印RDD的内容

2023-12-25

我尝试打印 RDD 的内容RDD[(String,List[(String,String)])]:

val sc = new SparkContext(conf)
val splitted = rdd.map(line => line.split(","))
val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
val grouped = processed.reduceByKey((x,y) => (x ++ y))
System.out.println(grouped)

然而,我没有看到内容:

ShuffledRDD[4] at reduceByKey at Consumer.scala:88

UPDATE:

TXT文件内容:

100001082016,230,111,1,1 
100001082016,121,111,1,1
100001082016,110,111,1,1

更新2(整个代码):

class Consumer()
{

def run() = {
    val conf = new SparkConf()
                              .setAppName("TEST")
                              .setMaster("local[*]") 
    val sc = new SparkContext(conf)
    val rdd = sc.textFile("file:///usr/test/myfile.txt")
    val splitted = rdd.map(line => line.split(","))
    val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
    val grouped = processed.reduceByKey((x,y) => (x ++ y))
    System.out.println(grouped)
}

}

这里没有问题:

scala> val rdd = sc.parallelize(Seq("100001082016,230,111,1,1","100001082016,121,111,1,1","100001082016,110,111,1,1"))
// rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:27

scala> val splitted = rdd.map(line => line.split(","))
// splitted: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[1] at map at <console>:29

scala> val processed = splitted.map(x=>(x(1),List((x(0),x(2),x(3),x(4)))))
// processed: org.apache.spark.rdd.RDD[(String, List[(String, String, String, String)])] = MapPartitionsRDD[2] at map at <console>:31

scala> val grouped = processed.reduceByKey((x,y) => (x ++ y))
// grouped: org.apache.spark.rdd.RDD[(String, List[(String, String, String, String)])] = ShuffledRDD[3] at reduceByKey at <console>:33

scala> grouped.collect().foreach(println)
// (121,List((100001082016,111,1,1)))
// (110,List((100001082016,111,1,1)))
// (230,List((100001082016,111,1,1)))

以下是错误的。它按预期工作,但您必须正确理解语言才能知道预期的内容:

scala> System.out.println(grouped)
// ShuffledRDD[3] at reduceByKey at <console>:33

EDIT:需要明确的是,如果您希望打印集合,则需要使用可用于您需要打印的集合的 mkString 方法,将其转换为您想要的格式。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

无法打印RDD的内容 的相关文章

随机推荐