-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7736] [core] Fix a race introduced in PythonRunner. #8258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race.
|
Example of the error: |
|
retest this please |
|
@vanzin is this the right JIRA? |
|
also, I've seen this non-determinism from the user list. It would definitely be good to fix it. |
|
Yes; this bug was introduced by a change that I pushed this morning (to fix the same bug this PR mentions; see #7751). |
|
Test build #41074 has finished for PR 8258 at commit
|
|
Test build #41077 has finished for PR 8258 at commit
|
|
Test build #41089 timed out for PR 8258 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I was going to say, isn't there some concurrency utility for this? and you could use a task or future or semaphore, but it might not be any less code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there was something, but the GatewayServer API is super weird.
|
I'll try tests again but I'm inclined to merge this soon. retest this please |
|
Test build #41137 has finished for PR 8258 at commit
|
|
pyspark fail is the same flaky test that has been failing on and off for a long time. I'm merging this. |
The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race. Author: Marcelo Vanzin <[email protected]> Closes #8258 from vanzin/SPARK-7736. (cherry picked from commit c1840a8)
The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race. Author: Marcelo Vanzin <[email protected]> Closes apache#8258 from vanzin/SPARK-7736. (cherry picked from commit c1840a8)
The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.