Skip to content

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

Yarn and mesos cluster mode support remote python path (HDFS/S3 scheme) by their own mechanism, it is not necessary to check and format the python when running on these modes. This is a potential regression compared to 1.6, so here propose to fix it.

How was this patch tested?

Unit test to verify SparkSubmit arguments, also with local cluster verification. Because of lack of MiniDFSCluster support in Spark unit test, there's no integration test added.

@SparkQA
Copy link

SparkQA commented Sep 18, 2016

Test build #65557 has finished for PR 15137 at commit d136155.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

andrewor14 commented Sep 21, 2016

Good catch. I'm merging this into master and 2.0. Thanks.

asfgit pushed a commit that referenced this pull request Sep 21, 2016
…s cluster mode

## What changes were proposed in this pull request?

Yarn and mesos cluster mode support remote python path (HDFS/S3 scheme) by their own mechanism, it is not necessary to check and format the python when running on these modes. This is a potential regression compared to 1.6, so here propose to fix it.

## How was this patch tested?

Unit test to verify SparkSubmit arguments, also with local cluster verification. Because of lack of `MiniDFSCluster` support in Spark unit test, there's no integration test added.

Author: jerryshao <[email protected]>

Closes #15137 from jerryshao/SPARK-17512.

(cherry picked from commit 8c3ee2b)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in 8c3ee2b Sep 21, 2016
@andrewor14
Copy link
Contributor

I've merged it. One more thing, would you mind correcting the comment in PythonRunner#formatPath? Right now it says "we currently only support local python files", which is apparently not true!

@jerryshao
Copy link
Contributor Author

Try to think in another way from PythonRunner's point, this comment looks correct. For yarn and mesos cluster mode, it is because we leverage distributed cache or other to download python files to local workspace. But from PythonRunner's point, it still can only support local files.

@andrewor14
Copy link
Contributor

Got it. I think back then when I wrote that comment we still haven't supported cluster mode with python yet. It's just a little confusing if we're reading that comment in isolation.

@steveloughran
Copy link
Contributor

I see this in master; but the JIRA associated with the PR is still opened & unversioned. Which version did it make it into?

@srowen
Copy link
Member

srowen commented Oct 3, 2016

From the commits, it's in 2.0.1 and 2.1.0. Looks like the JIRA just wasn't resolved (I'll do that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants