Skip to content

Conversation

@mortada
Copy link
Contributor

@mortada mortada commented Jan 21, 2016

…local vs cluster

@srowen thanks for the PR at #10866! sorry it took me a while.

This is related to #10866, basically the assignment in the lambda expression in the python example is actually invalid

In [1]: data = [1, 2, 3, 4, 5]
In [2]: counter = 0
In [3]: rdd = sc.parallelize(data)
In [4]: rdd.foreach(lambda x: counter += x)
  File "<ipython-input-4-fcb86c182bad>", line 1
    rdd.foreach(lambda x: counter += x)
                                   ^
SyntaxError: invalid syntax

@srowen
Copy link
Member

srowen commented Jan 21, 2016

Does it still execute without error on a cluster? (even if it doesn't actually increment the counter in the way someone might expect.) Certainly if it doesn't compile we need to change this, but want to make sure the result with "global" executes too.

@srowen
Copy link
Member

srowen commented Jan 21, 2016

Jenkins test this please

@mortada
Copy link
Contributor Author

mortada commented Jan 21, 2016

@srowen it compiles for local, let me test that on a cluster

I noticed that the next line is actually also invalid python

In [7]: print("Counter value: " + counter)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-e340457a6af8> in <module>()
----> 1 print("Counter value: " + counter)

TypeError: Can't convert 'int' object to str implicitly

I just updated the PR

@mortada
Copy link
Contributor Author

mortada commented Jan 21, 2016

@srowen I tested the python code in cluster mode (5 ec2 workers) and this works fine

16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.56:35937 with 6.6 GB RAM, BlockManagerId(4, 172.31.10.56, 35937)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.55:59871 with 6.6 GB RAM, BlockManagerId(0, 172.31.10.55, 59871)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.53:39162 with 6.6 GB RAM, BlockManagerId(1, 172.31.10.53, 39162)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.54:59145 with 6.6 GB RAM, BlockManagerId(2, 172.31.10.54, 59145)
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.10.57:35000 with 6.6 GB RAM, BlockManagerId(3, 172.31.10.57, 35000)
In [1]: data = [1, 2, 3, 4, 5]

In [2]: counter = 0

In [3]: rdd = sc.parallelize(data)

In [4]: def increment_counter(x):
    global counter
    counter += x
   ...:

In [5]: rdd.foreach(increment_counter)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.55:59871 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.56:35937 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.57:35000 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.53:39162 (size: 3.2 KB, free: 6.6 GB)
16/01/21 17:34:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.10.54:59145 (size: 3.2 KB, free: 6.6 GB)
(other output skipped)

In [6]: print("Counter value: ", counter)
Counter value:  0

@srowen
Copy link
Member

srowen commented Jan 22, 2016

LGTM then. I can merge this with the other doc updates

@srowen
Copy link
Member

srowen commented Jan 22, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Jan 22, 2016

Test build #49919 has finished for PR 10867 at commit 2e1c016.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jan 23, 2016
…local vs cluster

srowen thanks for the PR at #10866! sorry it took me a while.

This is related to #10866, basically the assignment in the lambda expression in the python example is actually invalid

```
In [1]: data = [1, 2, 3, 4, 5]
In [2]: counter = 0
In [3]: rdd = sc.parallelize(data)
In [4]: rdd.foreach(lambda x: counter += x)
  File "<ipython-input-4-fcb86c182bad>", line 1
    rdd.foreach(lambda x: counter += x)
                                   ^
SyntaxError: invalid syntax
```

Author: Mortada Mehyar <[email protected]>

Closes #10867 from mortada/doc_python_fix.

(cherry picked from commit 56f57f8)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Jan 23, 2016

Merged to master/1.6

@asfgit asfgit closed this in 56f57f8 Jan 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants