-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2991] Implement RDD lazy transforms for scanLeft and scan #1909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Erik, you've been doing some great work on making non-lazy transforms lazy! I haven't had time to thoroughly review your recent PRs, but can you do some checks and probably add some tests to make sure that all of your recent efforts also work correctly not only for synchronous actions on RDDs (collect, count, et al), but also the async actions in AsyncRDDActions.scala? It looked to me like at least the RangePartitioner work, while better than what is in Spark now, still had some troubles in async actions (essentially, the production of the rangeBounds doesn't get captured within the FutureAction, so it isn't cancellable, etc.) |
|
Good point, I will look into those. |
|
@erikerlandson perhaps also create an umbrella ticket and make all the related tickets a sub-task for the umbrella one? This way it is a lot easier to track them. Cheers. |
|
@rxin I created an umbrella: |
|
Can one of the admins verify this patch? |
|
+1 for this. Useful feature to calculate distributed cumulative sum. |
|
Any updates on this? Will you publish this as part of Silex like you did with #1839? |
|
@JoshRosen, yes I plan on adding this to silex in the near future. If you like I can close it. |
|
QA tests have started for PR 1909. This patch DID NOT merge cleanly! |
|
QA results for PR 1909: |
|
I migrated this PR to the 3rd party silex project: |
Discussion of implementations:
http://erikerlandson.github.io/blog/2014/08/09/implementing-an-rdd-scanleft-transform-with-cascade-rdds/
http://erikerlandson.github.io/blog/2014/08/12/implementing-parallel-prefix-scan-as-a-spark-rdd-transform/