Skip to content

Conversation

@skyluc
Copy link

@skyluc skyluc commented Apr 22, 2016

As specified in SPARK-14849, the address in the NettyRpcEnv for the executor is not set when running in standalone mode. To compensate, the driver try to guess the IP address of the executor during registration. But in a NAT situation, the IP address visible in the connection is not the one that should be used.
This address is sent back to the executor, and used later to describe the location of blocks.

The change is too always set the address as the host value, and to remove the code on the driver side which is trying to guess the right IP address. This kind of guessing is wrong in a NAT configuration, a possibly others.

Manually tested on the configuration describe on the ticket, and on a standard Mesos cluster.
Testing on a Yarn cluster would be useful, to check that there is no unexpected effects.

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56705 has finished for PR 12613 at commit 1cf8322.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 23, 2016

cc @zsxwing

@skyluc
Copy link
Author

skyluc commented Apr 25, 2016

Are the test failures real, or due to flaky tests?
I tried to reproduced the failures locally, but core/test passes most of the time.

RpcEndpointVerifier.NAME, new RpcEndpointVerifier(this, dispatcher))
}

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer nullable

@rxin
Copy link
Contributor

rxin commented Apr 26, 2016

I think there are also some code in SparkEnv that deals with this?

    if (isDriver) {
      conf.set("spark.driver.port", rpcEnv.address.port.toString)
    } else if (rpcEnv.address != null) {
      conf.set("spark.executor.port", rpcEnv.address.port.toString)
    }

@skyluc
Copy link
Author

skyluc commented Apr 27, 2016

Added comments, updated checks for cases when running in client mode, removed sending back the hostname to the executor.

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57125 has finished for PR 12613 at commit 0624898.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57133 has finished for PR 12613 at commit baaef11.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// hostname, using the connection information. But the value generated is wrong when
// the connection is NATed.
// [SPARK-14849]
if (server != null) RpcAddress(host, server.getPort()) else RpcAddress(host, -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is actually confusing. All of executors in the same node will have the same address.

@zsxwing
Copy link
Member

zsxwing commented May 2, 2016

The host is easy to fix. However, the hard part is port. The executor cannot know the port until it sends a message. In addition, since the executor runs in the client mode, it may have multiple clients/ports.

@zsxwing
Copy link
Member

zsxwing commented Oct 24, 2016

@skyluc could you close this one, please? You can submit a new PR when you have a better idea. Thanks!

srowen added a commit to srowen/spark that referenced this pull request Oct 31, 2016
@asfgit asfgit closed this in 26b07f1 Oct 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants