Skip to content

Conversation

@dtolpin
Copy link
Contributor

@dtolpin dtolpin commented Nov 17, 2015

invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.

In addition, the docstring used wrong parameter names, also fixed.

@dtolpin dtolpin changed the title invFunc=none work properly with python's reduceByKeyAndWindow invFunc=none works properly with python's reduceByKeyAndWindow Nov 17, 2015
@andrewor14
Copy link
Contributor

@tdas @dtolpin can you file a JIRA attached to the title of this PR? See https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

@zsxwing
Copy link
Member

zsxwing commented Nov 18, 2015

@dtolpin could you add a unit test for this fix?

@dtolpin dtolpin changed the title invFunc=none works properly with python's reduceByKeyAndWindow [SPARK-11812] [PySpark] Nov 18, 2015
@dtolpin
Copy link
Contributor Author

dtolpin commented Nov 18, 2015

I don't see how I can make a meaningful unit test for this bug, unfortunately. That would require that I use a time-based window with reduceByKeyAndWindow, and fixtures provided with unittests for python streaming do not support this.

However, this PR just changes checking a constant (which is meaningless) for trueness to checking a parameter. I understand that everything must be covered by unit tests, but the way the code is written it is an objective which is very difficult to achieve. Modifying the code so that it is unit-testable is a major rewrite, which won't have time to do.

@zsxwing
Copy link
Member

zsxwing commented Nov 18, 2015

@dtolpin could you add a summary to the PR title?

@dtolpin dtolpin changed the title [SPARK-11812] [PySpark] [SPARK-11812] [PySpark] invFunc=None works properly with python's reduceByKeyAndWindow Nov 18, 2015
@dtolpin
Copy link
Contributor Author

dtolpin commented Nov 18, 2015

Sure, I have just put the summary back into the PR title.

@zsxwing
Copy link
Member

zsxwing commented Nov 19, 2015

@dtolpin could you add the following test to WindowFunctionTests in python/pyspark/streaming/tests.py? It should reproduce the issue.

    def test_reduce_by_key_and_window_with_none_invFunc(self):
        input = [range(1), range(2), range(3), range(4), range(5), range(6)]

        def func(dstream):
            return dstream.map(lambda x: (x, 1))\
                .reduceByKeyAndWindow(operator.add, None, 5, 1)\
                .filter(lambda kv: kv[1] > 0).count()

        expected = [[2], [4], [6], [6], [6], [6]]
        self._test_func(input, func, expected)

@dtolpin
Copy link
Contributor Author

dtolpin commented Nov 19, 2015

Got it, added it, thank you.

@zsxwing
Copy link
Member

zsxwing commented Nov 19, 2015

Jenkins, test this please

@zsxwing
Copy link
Member

zsxwing commented Nov 19, 2015

@andrewor14 could you help trigger Jenkins? Thanks. I think I don't have the permission :(

@andrewor14
Copy link
Contributor

that's odd, you should have permission. retest this please

@zsxwing
Copy link
Member

zsxwing commented Nov 19, 2015

this is ok to test?

@zsxwing
Copy link
Member

zsxwing commented Nov 19, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 19, 2015

Test build #46351 has finished for PR 9775 at commit e76be01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Nov 19, 2015

LGTM

@tdas
Copy link
Contributor

tdas commented Nov 19, 2015

Cool, I will merge it in master, 1.6 and 1.5

asfgit pushed a commit that referenced this pull request Nov 19, 2015
…ceByKeyAndWindow

invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.

In addition, the docstring used wrong parameter names, also fixed.

Author: David Tolpin <[email protected]>

Closes #9775 from dtolpin/master.

(cherry picked from commit 599a8c6)
Signed-off-by: Tathagata Das <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 19, 2015
…ceByKeyAndWindow

invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.

In addition, the docstring used wrong parameter names, also fixed.

Author: David Tolpin <[email protected]>

Closes #9775 from dtolpin/master.

(cherry picked from commit 599a8c6)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in 599a8c6 Nov 19, 2015
asfgit pushed a commit that referenced this pull request Nov 19, 2015
…ceByKeyAndWindow

invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.

In addition, the docstring used wrong parameter names, also fixed.

Author: David Tolpin <[email protected]>

Closes #9775 from dtolpin/master.

(cherry picked from commit 599a8c6)
Signed-off-by: Tathagata Das <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 19, 2015
…ceByKeyAndWindow

invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness (that is, not None, in this context). A local function is never None,
thus the case of invFunc=None (a common one when inverse reduction is not defined) was treated incorrectly, resulting in loss of data.

In addition, the docstring used wrong parameter names, also fixed.

Author: David Tolpin <[email protected]>

Closes #9775 from dtolpin/master.

(cherry picked from commit 599a8c6)
Signed-off-by: Tathagata Das <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants