-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27000][PYTHON] Upgrades cloudpickle to v0.8.0 #23904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #102812 has finished for PR 23904 at commit
|
|
Test build #102811 has finished for PR 23904 at commit
|
|
retest this please |
|
Test build #102818 has finished for PR 23904 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like an important fix to pick up.
| global global_func | ||
| self.assertEqual(self.sc.parallelize([1]).map(lambda _: global_func()).first(), "Hi") | ||
| global_func = lambda: "Yeah" | ||
| self.assertEqual(self.sc.parallelize([1]).map(lambda _: global_func()).first(), "Yeah") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a test locally:
>>> global_func = lambda: "Hi"
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Hi'
>>> global_func = lambda: "Yeah"
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Hi'
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Yeah'
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Hi'Seems it outputs Hi or Yeah randomly. Is it caused by this cloudpickle issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks so. After the fix, the result becomes consistent:
>>> global_func = lambda: "Hi"
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Hi'
>>> global_func = lambda: "Yeah"
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Yeah'
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Yeah'
>>> sc.parallelize([1]).map(lambda _: global_func()).first()
'Yeah'|
Let me get this in. The pickle protocol change at #20691 was closely reviewed and I think it's still important change to keep than revert for now. |
What changes were proposed in this pull request?
After upgrading cloudpickle to 0.6.1 at #20691, one regression was found. Cloudpickle had a critical cloudpipe/cloudpickle#240 for that.
Basically, it currently looks existing globals would override globals shipped in a function's, meaning:
Before:
After:
Therefore, this PR upgrades cloudpickle to 0.8.0.
Note that cloudpickle's release cycle is quite short.
Between 0.6.1 and 0.7.0, it contains minor bug fixes. I don't see notable changes to double check and/or avoid.
There is virtually only this fix between 0.7.0 and 0.8.1 - other fixes are about testing.
How was this patch tested?
Manually tested, tests were added. Verified unit tests were added in cloudpickle.