我有以下数据帧,其中包含双精度数组,需要将其转换为向量才能将其传递给 ML 算法。谁能帮我这个?
fList: org.apache.spark.sql.DataFrame = [features: array<double>]
+--------------------------------------------------------------------------------+
|features |
+--------------------------------------------------------------------------------+
|[2.5046410000000003, 2.1487149999999997, 1.0884870000000002, 3.5877090000000003]|
|[0.9558040000000001, 0.9843780000000002, 0.545025, 0.9979860000000002] |
+--------------------------------------------------------------------------------+
预期输出:
应该看起来像这样。
fList: org.apache.spark.sql.DataFrame = [features: vector]
我建议你写一个udf
功能
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.linalg.Vectors
def convertArrayToVector = udf((features: mutable.WrappedArray[Double]) => Vectors.dense(features.toArray))
并在中调用该函数withColumn
api
scala> df.withColumn("features", convertArrayToVector($"features"))
res1: org.apache.spark.sql.DataFrame = [features: vector]
我希望答案有帮助
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)