Skip to content

Commit 9bad0b7

Browse files
ankurdaverxin
authored andcommitted
[SPARK-2025] Unpersist edges of previous graph in Pregel
Due to a bug introduced by apache#497, Pregel does not unpersist replicated vertices from previous iterations. As a result, they stay cached until memory is full, wasting GC time. This PR corrects the problem by unpersisting both the edges and the replicated vertices of previous iterations. This is safe because the edges and replicated vertices of the current iteration are cached by the call to `g.cache()` and then materialized by the call to `messages.count()`. Therefore no unmaterialized RDDs depend on `prevG.edges`. I verified that no recomputation occurs by running PageRank with a custom patch to Spark that warns when a partition is recomputed. Thanks to Tim Weninger for reporting this bug. Author: Ankur Dave <[email protected]> Closes apache#972 from ankurdave/SPARK-2025 and squashes the following commits: 13d5b07 [Ankur Dave] Unpersist edges of previous graph in Pregel
1 parent 3d3f8c8 commit 9bad0b7

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ object Pregel extends Logging {
150150
oldMessages.unpersist(blocking=false)
151151
newVerts.unpersist(blocking=false)
152152
prevG.unpersistVertices(blocking=false)
153+
prevG.edges.unpersist(blocking=false)
153154
// count the iteration
154155
i += 1
155156
}

0 commit comments

Comments
 (0)