Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Aug 17, 2015

The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.

The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.
@vanzin
Copy link
Contributor Author

vanzin commented Aug 17, 2015

Example of the error:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41051/artifact/yarn/target/unit-tests.log

File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 425, in startOverflowError: getsockaddrarg: port must be 0-65535.

@vanzin
Copy link
Contributor Author

vanzin commented Aug 18, 2015

retest this please

@andrewor14
Copy link
Contributor

@vanzin is this the right JIRA?

@andrewor14
Copy link
Contributor

also, I've seen this non-determinism from the user list. It would definitely be good to fix it.

@vanzin
Copy link
Contributor Author

vanzin commented Aug 18, 2015

Yes; this bug was introduced by a change that I pushed this morning (to fix the same bug this PR mentions; see #7751).

@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41074 has finished for PR 8258 at commit 30b0ee5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41077 has finished for PR 8258 at commit cfef35d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41089 timed out for PR 8258 at commit d8831a2 after a configured wait of 175m.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I was going to say, isn't there some concurrency utility for this? and you could use a task or future or semaphore, but it might not be any less code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was something, but the GatewayServer API is super weird.

@vanzin
Copy link
Contributor Author

vanzin commented Aug 18, 2015

I'll try tests again but I'm inclined to merge this soon. retest this please

@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41137 has finished for PR 8258 at commit d8831a2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Aug 18, 2015

pyspark fail is the same flaky test that has been failing on and off for a long time. I'm merging this.

@asfgit asfgit closed this in c1840a8 Aug 18, 2015
@vanzin vanzin deleted the SPARK-7736 branch August 18, 2015 18:42
asfgit pushed a commit that referenced this pull request Sep 9, 2015
The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.

Author: Marcelo Vanzin <[email protected]>

Closes #8258 from vanzin/SPARK-7736.

(cherry picked from commit c1840a8)
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.

Author: Marcelo Vanzin <[email protected]>

Closes apache#8258 from vanzin/SPARK-7736.

(cherry picked from commit c1840a8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants