将 Spark 中的字符串数组转换为字节数组并使用 UDF 将其检索回来

2024-03-23

我正在尝试将 Spark 中的字符串数组转换为字节数组,然后将字节数组重新转换为字符串数组。

但是,我没有按照我的预期取回字符串数组。这是代码 -

// UDFs for converting Array[String] to byte array and get back Array[String] from byte array
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.databind.ObjectMapper 

val mapper: ObjectMapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)

val convertToByteArray = udf((map: Seq[String]) => mapper.writeValueAsBytes(map))
val convertToString = udf((a: Array[Byte])=> new String(a))

val arrayDF = Seq(
  ("x100", Array("p1","p2","p3","p4"))
).toDF("id", "myarray")
arrayDF.printSchema()
root
 |-- id: string (nullable = true)
 |-- myarray: array (nullable = true)
 |    |-- element: string (containsNull = true)
arrayDF.show(false)
+----+----------------+
|id  |myarray         |
+----+----------------+
|x100|[p1, p2, p3, p4]|
+----+----------------+

val converted = arrayDF.withColumn("bytearray", convertToByteArray($"myarray")).select($"id",$"bytearray")
converted.printSchema()
root
 |-- id: string (nullable = true)
 |-- bytearray: binary (nullable = true)
converted.show(false)
+----+----------------------------------------------------------------+
|id  |bytearray                                                       |
+----+----------------------------------------------------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|
+----+----------------------------------------------------------------+

val getBack = converted.withColumn("getstring", convertToString($"bytearray")) 
getBack.printSchema()
root
 |-- id: string (nullable = true)
 |-- bytearray: binary (nullable = true)
 |-- getstring: string (nullable = true)
getBack.show(false)
+----+----------------------------------------------------------------+---------------------+
|id  |bytearray                                                       |getstring            |
+----+----------------------------------------------------------------+---------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|["p1","p2","p3","p4"]|
+----+----------------------------------------------------------------+---------------------+

然而,我希望我的最终结果是 -

+----+----------------------------------------------------------------+---------------------+
|id  |bytearray                                                       |getstring            |
+----+----------------------------------------------------------------+---------------------+
|x100|[5B 22 70 31 22 2C 22 70 32 22 2C 22 70 33 22 2C 22 70 34 22 5D]|[p1,p2,p3,p4]|
+----+----------------------------------------------------------------+---------------------+

这里是pom.xml我用来创建字节数组

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.9.5</version>
</dependency>

您获取一个字符串列表并将其视为单个对象,然后在转换回来时将其视为只是一个字符串 - 如果您想要返回单个字符串,您还需要将列表转换为字符串:

val convertToByteArray = udf((map: Seq[String]) => mapper.writeValueAsBytes(map.mkString("[",",","]")))
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

将 Spark 中的字符串数组转换为字节数组并使用 UDF 将其检索回来 的相关文章

随机推荐