什么是sparkgragh : 它是为了为用户建立关系的视图
SparkGraph图计算
基础篇
1. what?
表示数据关系的数据结构
基本元素: 点Vertex,边Edge
Vertex[(VertexId:Long,VertexAttr:Any)]
Edge[(srcVertexId:Long,dstVertexId:Long,attr:Any)]
组合元素: Triplet(源点SrcVertex+边Edge+目标点LstVertex)
类似于RDD
弹性的
分布式的
容错的
2. why?
能够表达复杂的数据关系,并做复杂的图计算
User(userId:Int,name:String,age:Int)
user_relation(fromUserId:Int,toUserId:Int,relation:String)
3. where?
社交
4. how?
val sc:SparkContext = …
val rddVertex = sc.makeRDD(Seq[(VertexId,T]) //创建点RDD T点属性:单个值用基础类型,多个值用样例类
val rddEdge = sc.makeRDD(Seq[Edge(T)]) //创建边RDD T边属性: 单个值用基础类型,多个值用样例类
val gragh = Graph(rddVertex,rddEdge)
属性
graph.vertices //点集合
graph.edges //边集合
graph.triplets //点边点组合集合
gragh.inDegrees //入度 到某点取得记录条数 (?,dstVertexId)
graph.outDegrees //出度 从某点出发的记录条数 (srcVertexId,?)
转换算子
graph.mapTriplets ED2:Graph[VD,ED2]
//针对点属性逐个变形处理,返回新的属性,最终返回基于变性后的点和原来的边的图对象
graph.mapVertices VD2:Graph[VD2,ED] 变一个点出去
graph.mapEdges ED2:Graph[VD,ED2] 变一个边出去
算法篇
package cn.kgc.sparkGragh
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx._
object SparkGraph01 {
case class Person(name:String,age:Int)
def main(args: Array[String]): Unit = {
val CONF = new SparkConf().setMaster("local[*]").setAppName("spark_graphx_02")
val SC = new SparkContext(CONF)
val rddVertex: RDD[(Long, Person)] = SC.makeRDD(Array(
(1L, Person("Henry", 22)),
(2L, Person("Pola", 20)),
(3L, Person("Ariel", 18)),
(4L, Person("Jack", 29)),
(5L, Person("Tom", 31)),
(6L, Person("Shel", 16)),
(7L, Person("Niky", 28))
))
val rddEdge: RDD[Edge[String]] = SC.makeRDD(Array(
Edge(1L, 2L, "family"),
Edge(1L, 3L, "family"),
Edge(1L, 4L, "teacher"),
Edge(1L, 5L, "teacher"),
Edge(1L, 6L, "teacher"),
Edge(1L, 7L, "teacher"),
Edge(4L, 5L, "friend"),
Edge(4L, 6L, "friend"),
Edge(6L, 7L, "friend"),
Edge(1L, 4L, "friend"),
Edge(1L, 5L, "friend")
))
val graph: Graph[Person, String] = Graph(rddVertex, rddEdge).cache()
var ME = 1L
SC.stop()
}
}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)