您可以以下方式使用spark-sftp库在你的程序:
火花2.x的
Maven的依赖
com.springml
spark-sftp_2.11
1.1.0
SBT依赖
libraryDependencies += "com.springml" % "spark-sftp_2.11" % "1.1.0"
与火花壳
此包可添加使用--packages命令行选项来使用火花。
$ bin/spark-shell --packages com.springml:spark-sftp_2.11:1.1.0
Scala的API
// Construct Spark dataframe using file in FTP server
val df = spark.read.
format("com.springml.spark.sftp").
option("host", "SFTP_HOST").
option("username", "SFTP_USER").
option("password", "****").
option("fileType", "csv").
option("inferSchema", "true").
load("/ftp/files/sample.csv")
// Write dataframe as CSV file to FTP server
df.write.
format("com.springml.spark.sftp").
option("host", "SFTP_HOST").
option("username", "SFTP_USER").
option("password", "****").
option("fileType", "csv").
save("/ftp/files/sample.csv")
用于火花的1.x(1.5+)
Maven的依赖:例如,要在启动火花壳时它包括
com.springml
spark-sftp_2.10
1.0.2
SBT依赖
libraryDependencies += "com.springml" % "spark-sftp_2.10" % "1.0.2"
与火花壳
此包可添加使用--packages命令行选项来使用火花。例如,在启动火花外壳时,它包含:
$ bin/spark-shell --packages com.springml:spark-sftp_2.10:1.0.2
斯卡拉API
import org.apache.spark.sql.SQLContext
// Construct Spark dataframe using file in FTP server
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.
format("com.springml.spark.sftp").
option("host", "SFTP_HOST").
option("username", "SFTP_USER").
option("password", "****").
option("fileType", "csv").
option("inferSchema", "true").
load("/ftp/files/sample.csv")
// Write dataframe as CSV file to FTP server
df.write().
format("com.springml.spark.sftp").
option("host", "SFTP_HOST").
option("username", "SFTP_USER").
option("password", "****").
option("fileType", "csv").
save("/ftp/files/sample.csv")
欲了解更多有关spark-sftp你可以参观那里的github页springml/spark-sftp