-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5461] [graphx] Add isCheckpointed, getCheckpointedFiles methods to Graph #4253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Unrelated failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe beef up the documentation to say this only returns true if both vertices and edges are checkpointed?
|
@rxin Added more doc. Thanks! |
|
lgtm. |
|
Test build #568 has finished for PR 4253 at commit
|
|
Failure in org.apache.spark.sql.CachedTableSuite...we'll see how the 2nd test fares |
|
I'm not sure if this patch actually works. The following code works with RDD, but the similar code (below) does not work with Graph: |
|
I think the problem is that VertexRDD didn't override isCheckpointed |
|
Test build #26253 has finished for PR 4253 at commit
|
… VertexRDDImpl. The corresponding Graph methods now work.
|
That broke some stuff...will fix soon |
|
OK, I'm actually confused about how to check isCheckpointed. It looks like it's called by workers, which is a problem when I try to use "partitionsRDD." What should I check instead? |
|
Test build #26269 has finished for PR 4253 at commit
|
|
is the problem partitionsRDD transient? |
|
@rxin I believe so, but I'm not sure what alternatives there are. |
|
Test build #26336 has finished for PR 4253 at commit
|
|
@ankurdave I guess making partitionsRDD non-transient didn't cause any problems. |
|
@jkbradley @rxin It seems that there was a long discussion about this https://issues.apache.org/jira/browse/SPARK-4672 @JerryLead Do you know any workaround to fix unit tests while keeping |
|
@jkbradley IMHO, we do not need to override the |
|
isCheckedpointed is call in computeOrReadCheckpoint, which happens on workers: I think the correct solution won't be using transient to cut lineage but remove the dependencies properly. |
…, and made isCheckpointed check firstParent instead of partitionsRDD
|
That last commit added the transient tag back and uses firstParent in isCheckpointed, per @mengxr 's suggestion. |
…though not needed to compile
|
Test build #26526 has finished for PR 4253 at commit
|
|
That latest test passing was before I added the class tag to firstParent; the next will be with the class tag added. |
|
Test build #26542 has finished for PR 4253 at commit
|
|
The failed test is in Spark SQL. Already pinged @liancheng to take a look. I'm going to merge this PR. |
|
PR #4173 unpersists tables in a non-blocking way, and causes race condition in |
Added the 2 methods to Graph and GraphImpl. Both make calls to the underlying vertex and edge RDDs.
This is needed for another PR (for LDA): [https://github.com//pull/4047]
Notes:
CC: @rxin
CC: @mengxr (since related to LDA)