-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11812] [PySpark] invFunc=None works properly with python's reduceByKeyAndWindow #9775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tdas @dtolpin can you file a JIRA attached to the title of this PR? See https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark |
|
@dtolpin could you add a unit test for this fix? |
|
I don't see how I can make a meaningful unit test for this bug, unfortunately. That would require that I use a time-based window with reduceByKeyAndWindow, and fixtures provided with unittests for python streaming do not support this. However, this PR just changes checking a constant (which is meaningless) for trueness to checking a parameter. I understand that everything must be covered by unit tests, but the way the code is written it is an objective which is very difficult to achieve. Modifying the code so that it is unit-testable is a major rewrite, which won't have time to do. |
|
@dtolpin could you add a summary to the PR title? |
|
Sure, I have just put the summary back into the PR title. |
|
@dtolpin could you add the following test to def test_reduce_by_key_and_window_with_none_invFunc(self):
input = [range(1), range(2), range(3), range(4), range(5), range(6)]
def func(dstream):
return dstream.map(lambda x: (x, 1))\
.reduceByKeyAndWindow(operator.add, None, 5, 1)\
.filter(lambda kv: kv[1] > 0).count()
expected = [[2], [4], [6], [6], [6], [6]]
self._test_func(input, func, expected) |
|
Got it, added it, thank you. |
|
Jenkins, test this please |
|
@andrewor14 could you help trigger Jenkins? Thanks. I think I don't have the permission :( |
|
that's odd, you should have permission. retest this please |
|
this is ok to test? |
|
retest this please |
|
Test build #46351 has finished for PR 9775 at commit
|
|
LGTM |
|
Cool, I will merge it in master, 1.6 and 1.5 |
…ceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None, thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data. In addition, the docstring used wrong parameter names, also fixed. Author: David Tolpin <[email protected]> Closes #9775 from dtolpin/master. (cherry picked from commit 599a8c6) Signed-off-by: Tathagata Das <[email protected]>
…ceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None, thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data. In addition, the docstring used wrong parameter names, also fixed. Author: David Tolpin <[email protected]> Closes #9775 from dtolpin/master. (cherry picked from commit 599a8c6) Signed-off-by: Tathagata Das <[email protected]>
…ceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None, thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data. In addition, the docstring used wrong parameter names, also fixed. Author: David Tolpin <[email protected]> Closes #9775 from dtolpin/master. (cherry picked from commit 599a8c6) Signed-off-by: Tathagata Das <[email protected]>
…ceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None, thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data. In addition, the docstring used wrong parameter names, also fixed. Author: David Tolpin <[email protected]> Closes #9775 from dtolpin/master. (cherry picked from commit 599a8c6) Signed-off-by: Tathagata Das <[email protected]>
invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.
In addition, the docstring used wrong parameter names, also fixed.