Skip to content

Conversation

@jegonzal
Copy link
Contributor

The initial Pregel API coupled voting to halt with message reception. In this revised the vertex program receives a PregelContext which enables the user to signal whether or not to halt as well as access the current iteration number.

@jegonzal
Copy link
Contributor Author

@ankurdave unfortunately to full accept this change we will need to break compatibility with the current Pregel API. I cannot seem to overload the apply method.

@jegonzal
Copy link
Contributor Author

I spent some time verifying the math behind the PageRank (in particular starting values) to ensure that the delta formulation behaves identically to the static formulation which matches other reference implementations of PageRank. One of the key changes is I have added an extra normalization step at the end of the calculation to address a discrepancy in how we handle dangling vertices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommenting lines 279 and 280 leads to a substantial slow down in later iterations indicating that there is still an issue with unpersist. @ankurdave any thoughts?

@jegonzal
Copy link
Contributor Author

@ankurdave and @rxin there is an issue with the current API. The sendMessage function pull the active field out of the vertex value here:
https://github.com/apache/spark/pull/1217/files#diff-e399679417ffa6eeedf26a7630baca16R243

def sendMessageWrapper(triplet: EdgeTriplet[(VD, Boolean),ED]): Iterator[(VertexId, A)] = {
  val simpleTriplet = new EdgeTriplet[VD, ED]()
   simpleTriplet.set(triplet)
   simpleTriplet.srcAttr = triplet.srcAttr._1
   simpleTriplet.dstAttr = triplet.dstAttr._1
   val ctx = new EdgeContext(i, triplet.srcAttr._2, triplet.dstAttr._2)
   sendMsg(simpleTriplet, ctx)
 }
// Compute the messages for all the active vertices
val messages = g.mapReduceTriplets(sendMessageWrapper, mergeMsg, Some((activeVertices, activeDirection)))

thereby allowing the user a simple sendMsg interface:

sendMsg: (EdgeTriplet[VD, ED], EdgeContext) => Iterator[(VertexId, A)]

However because we access the source and destination vertex attributes the byte code inspection will force a full 3-way join even if the user doesn't actually read the fields.

The simplest solution would be to change the send message interface to operate on the extended vertex attribute (containing the active field).

sendMsg: (EdgeTriplet[(VD, Boolean), ED], EdgeContext) => Iterator[(VertexId, A)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"weighted sum" :)

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have started for PR 1217 at commit 2116e8a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 28, 2014

QA tests have finished for PR 1217 at commit 2116e8a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@jegonzal
Copy link
Contributor Author

This is still work in progress and we need to discuss these API changes.

@jegonzal
Copy link
Contributor Author

@ankurdave is this already covered in your latest PR?

@jegonzal
Copy link
Contributor Author

@ankurdave should I try and update this with your latest changes or do you want to create a new one?

@ankurdave
Copy link
Contributor

@jegonzal I'm going to look at this over the weekend to try to get it into 1.2.

@nchammas
Copy link
Contributor

@jegonzal @ankurdave - This PR has gone stale. Do we want to update it or close it for later?

@ankurdave
Copy link
Contributor

Let's close this issue for later.

@srowen
Copy link
Member

srowen commented Jul 17, 2015

(For Github close script:) Do you mind closing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants