-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14849][CORE]Always set an address for the executor #12613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #56705 has finished for PR 12613 at commit
|
|
cc @zsxwing |
|
Are the test failures real, or due to flaky tests? |
| RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher)) | ||
| } | ||
|
|
||
| @Nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer nullable
|
I think there are also some code in SparkEnv that deals with this? |
|
Added comments, updated checks for cases when running in client mode, removed sending back the hostname to the executor. |
|
Test build #57125 has finished for PR 12613 at commit
|
|
Test build #57133 has finished for PR 12613 at commit
|
| // hostname, using the connection information. But the value generated is wrong when | ||
| // the connection is NATed. | ||
| // [SPARK-14849] | ||
| if (server != null) RpcAddress(host, server.getPort()) else RpcAddress(host, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 is actually confusing. All of executors in the same node will have the same address.
|
The host is easy to fix. However, the hard part is |
|
@skyluc could you close this one, please? You can submit a new PR when you have a better idea. Thanks! |
Closes apache#11610 Closes apache#15411 Closes apache#15501 Closes apache#12613 Closes apache#12518 Closes apache#12026 Closes apache#15524 Closes apache#12693 Closes apache#12358 Closes apache#15588 Closes apache#15635 Closes apache#15678 Closes apache#14699 Closes apache#9008
As specified in SPARK-14849, the
addressin theNettyRpcEnvfor the executor is not set when running in standalone mode. To compensate, the driver try to guess the IP address of the executor during registration. But in a NAT situation, the IP address visible in the connection is not the one that should be used.This address is sent back to the executor, and used later to describe the location of blocks.
The change is too always set the address as the
hostvalue, and to remove the code on the driver side which is trying to guess the right IP address. This kind of guessing is wrong in a NAT configuration, a possibly others.Manually tested on the configuration describe on the ticket, and on a standard Mesos cluster.
Testing on a Yarn cluster would be useful, to check that there is no unexpected effects.