-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-8016] YARN cluster / client modes have different app names for python #6671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@ehnalis what happens if we set |
|
@andrewor14 Currently the only place where you can set the name of your application is when you prepare an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of mentioning YARN here since this parameter if for all modes. Also, I don't think the documentation should encourage the use of SparkContext.setAppName, exactly because it has different behaviour in client and cluster modes.
In summary, I think this comment should be left alone. If --name for some reason doesn't work in yarn-client mode, that should be fixed. If there are other documents that talk about SparkContext.setAppName, they should probably be changed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin There is currently no way to bring yarn-cluster and yarn-client modes under the same hood due to the reasons I've mentioned in the description of the pull-request. Do you think, that --name should override the name set by SparkContext.setAppName?
We can not be consistent without the removal of SparkContext.setAppName or without overriding it with --name if set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think, that --name should override the name set by SparkContext.setAppName?
No, but if you want to document that, this is the wrong place. I'm saying that we should discourage the use of SparkContext.setAppName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin Okay. The problem is with yarn-client mode. When you have an application name with SparkContext.setAppName and you use the spark-submit shell script with --name as well, SparkContext.setAppName will be used. --name should have precedence then. Would it be okay, if we would change the code to use --name always - if supplied - and to encourage users at SparkContext.setAppName to use --name if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ehnalis,
I don't really have a strong opinion about whether setAppName should override the command line or not, other than it's a change in semantics of how it currently works.
I can see how it may lead to slight confusion since cluster-mode apps will have slightly different behaviour, but the user can always do the same thing by just doing SparkConf.set("spark.app.name", "foo"), unless you add logic to SparkConf to not allow that key to be overridden once it's set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin SparkConf.set("spark.app.name", "foo") will not work in cluster-mode applications, since this line of code will run when the AM ignites the main user class. This setting will not take effect, since it will run on YARN, while the client already set the application name during application-submission. (And you are not able to change the app's name later on with YARN.)
I suggest to explicitly emphasize this behavior and limitation at setAppName, and I would encourage YARN users to use spark-submit --name while deploying. Maybe we should deprecate setAppName, because it can take no effect in cluster-mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkConf.set("spark.app.name", "foo") will not work in cluster-mode applications
I know and that's not what I meant. I meant that even if we remove SparkConf.setAppName people can still set the app name by setting the conf directly, in client mode, and cause exactly the same discrepancy. So the only option t really have the command line version take over is to have the concept of a "final" config that cannot be overridden once it's set.
I don't know whether have something like that is worth it to fix this small issue. It's better just to discourage people from setting the app name programmatically.
|
I see, because in cluster mode the app name could be set in the jar itself. I'm inclined to close this issue as a "Won't Fix" then. I agree with @vanzin that this doesn't look like the best place to document the differences in app name behavior. Maybe it's not even that big of a deal that it is divergent. |
|
It would be nice to have at least some clarification on this matter in the docs. Or we might consider deprecating the {{SparkContext.setAppName}}, since it can not provide a solution for every deployment scenario. |
[SPARK-8016] YARN cluster / client modes have different app names for python
Cause: Currently in YARN the only point where the name of the application can be set is when it gets submitted by the client. In yarn-cluster mode this is performed by the spark-submit, which uses org.apache.spark.deploy.yarn.Client. The name of the application is picked from --name, or by default from --class. In case of yarn-client mode, the name of the application is set by SparkContext.setAppName.
Solution: It is not feasible to read the name set with SparkContext.setAppName in yarn-cluster mode. Added additional notes to the arguments list of spark-submit.