-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11904] [PySpark] reduceByKeyAndWindow does not require checkpointing when invFunc is None #9888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
reduceByKeyAndWindow(func, None, window_size, slide_size) is equivalent to reduceByKey(func).window(window_size, slide_size).reduceByKey(func) and should not require checkpointing.
|
Any update on this please? |
|
Jenkins, test this please |
|
retest this please |
|
LGTM pending tests |
|
retest this please |
|
Test build #47848 has finished for PR 9888 at commit
|
|
retest this please |
|
Test build #47856 has finished for PR 9888 at commit
|
|
retest this please |
|
Test build #47870 has finished for PR 9888 at commit
|
|
retest this please |
|
Test build #47877 has finished for PR 9888 at commit
|
|
Merging to master. Thanks @dtolpin |
when invFunc is None,
reduceByKeyAndWindow(func, None, winsize, slidesize)is equivalent toand no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this.
I do not know how to unit-test this.