Spark GraphX的函数源码分析及应用实例

1. outerJoinVertices函数首先给出源代码1 override def outerJoinVertices[U: ClassTag, VD2: ClassTag]2(other: RDD[(VertexId, U)])//带插入的顶点信息3(updateF: (VertexId, VD, O...

weixin_30369041

53人浏览 · 2015-12-08 11:25:00

weixin_30369041 · 2015-12-08 11:25:00 发布

1. outerJoinVertices函数

首先给出源代码

 1 override def outerJoinVertices[U: ClassTag, VD2: ClassTag]
 2       (other: RDD[(VertexId, U)])                   //带插入的顶点信息
 3       (updateF: (VertexId, VD, Option[U]) => VD2)   //更新函数
 4       (implicit eq: VD =:= VD2 = null): Graph[VD2, ED] = {
 5     // The implicit parameter eq will be populated by the compiler if VD and VD2 are equal, and left
 6     // null if not
　　　　// 其中，VD2表示最终生成的新图的VD类型；VD表示原图的VD类型

 7     if (eq != null) {         //如果新旧两个图的VD类型不一致
 8       vertices.cache()
 9       // updateF preserves type, so we can use incremental replication
10       val newVerts = vertices.leftJoin(other)(updateF).cache()                   //对图的顶点做左连接
11       val changedVerts = vertices.asInstanceOf[VertexRDD[VD2]].diff(newVerts)    //比较新生成的定点序列与原始定点序列直接修改格式后的序列之间的差异
12       val newReplicatedVertexView = replicatedVertexView.asInstanceOf[ReplicatedVertexView[VD2, ED]]
13         .updateVertices(changedVerts)                                            //根据changedVerts构造新的replicatedVertexView

14  　　　new GraphImpl(newVerts, newReplicatedVertexView) 
15  　　} else { 
16         // updateF does not preserve type, so we must re-replicate all vertices 
17         val newVerts = vertices.leftJoin(other)(updateF) 
18  　　　　GraphImpl(newVerts, replicatedVertexView.edges) 
19  　　　}
20 }

其中， replicatedVertexView的官方解释是：“Manages shipping vertex attributes to the edge partitions of an EdgeRDD. Vertex attributes may be partially shipped to construct a triplet view with vertex attributes on only one side, and they may be updated. ” 个人理解是在边对象的上面增加了顶点属性。

针对官方的例子：

1 val graph = followerGraph.outerJoinVertices(users) {
2   case (uid, deg, Some(attrList)) => attrList
3   case (uid, deg, None) => Array.empty[String]
4 }

首先介绍代码目的： followerGraph是通过调用GraphLoader.edgeListFile()函数，从边文件中读入的。由于边文件中只存储了相应的顶点编号，没有定点对应的属性。因此需要使用user(VertexId, attr)来将定点信息补全。

其中，deg为followerGraph的顶点属性，case的第三个参数attrList表示user的顶点属性。箭头(=>)后的attrList表示修改后followerGraph的顶点属性。

通过源代码可以看出，在执行outerJoinVertices时，首先执行的是顶点序列(VertexRDD)的LeftJoin，也就是将顶点编号一致的顶点的属性替换到followerGraph中。

转载于:https://www.cnblogs.com/kingatnuaa/p/5028136.html

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

开源成为金融领域创新发展的新动力引擎