我是apache Spark的新手,从MLlib的文档中,我找到了scala的示例,但我真的不知道scala,有人知道java中的示例吗?谢谢!示例代码是
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
// Load and parse the data
val data = sc.textFile("mllib/data/ridge-data/lpsa.data")
val parsedData = data.map { line =>
val parts = line.split(',')
LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray)
}
// Building the model
val numIterations = 20
val model = LinearRegressionWithSGD.train(parsedData, numIterations)
// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val MSE = valuesAndPreds.map{ case(v, p) => math.pow((v - p), 2)}.reduce(_ + _)/valuesAndPreds.count
println("training Mean Squared Error = " + MSE)
从文件MLlib http://spark.apache.org/docs/latest/mllib-guide.html
thanks!
如文档中所示:
MLlib 的所有方法都使用 Java 友好的类型,因此您可以导入和
在那里调用它们的方式与在 Scala 中的方式相同。唯一需要注意的是
这些方法采用 Scala RDD 对象,而 Spark Java API 使用
单独的 JavaRDD 类。您可以通过以下方式将 Java RDD 转换为 Scala:
在 JavaRDD 对象上调用 .rdd()。
这并不容易,因为您仍然必须在 java 中重现 scala 代码,但它可以工作(至少在本例中)。
话虽如此,这是一个java实现:
public void linReg() {
String master = "local";
SparkConf conf = new SparkConf().setAppName("csvParser").setMaster(
master);
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> data = sc.textFile("mllib/data/ridge-data/lpsa.data");
JavaRDD<LabeledPoint> parseddata = data
.map(new Function<String, LabeledPoint>() {
// I see no ways of just using a lambda, hence more verbosity than with scala
@Override
public LabeledPoint call(String line) throws Exception {
String[] parts = line.split(",");
String[] pointsStr = parts[1].split(" ");
double[] points = new double[pointsStr.length];
for (int i = 0; i < pointsStr.length; i++)
points[i] = Double.valueOf(pointsStr[i]);
return new LabeledPoint(Double.valueOf(parts[0]),
Vectors.dense(points));
}
});
// Building the model
int numIterations = 20;
LinearRegressionModel model = LinearRegressionWithSGD.train(
parseddata.rdd(), numIterations); // notice the .rdd()
// Evaluate model on training examples and compute training error
JavaRDD<Tuple2<Double, Double>> valuesAndPred = parseddata
.map(point -> new Tuple2<Double, Double>(point.label(), model
.predict(point.features())));
// important point here is the Tuple2 explicit creation.
double MSE = valuesAndPred.mapToDouble(
tuple -> Math.pow(tuple._1 - tuple._2, 2)).mean();
// you can compute the mean with this function, which is much easier
System.out.println("training Mean Squared Error = "
+ String.valueOf(MSE));
}
它远非完美,但我希望它能让您更好地理解如何使用 Mllib 文档中的 scala 示例。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)