Skip to content

Conversation

@zzvara
Copy link
Contributor

@zzvara zzvara commented Jun 5, 2015

[SPARK-8016] YARN cluster / client modes have different app names for python

Cause: Currently in YARN the only point where the name of the application can be set is when it gets submitted by the client. In yarn-cluster mode this is performed by the spark-submit, which uses org.apache.spark.deploy.yarn.Client. The name of the application is picked from --name, or by default from --class. In case of yarn-client mode, the name of the application is set by SparkContext.setAppName.

Solution: It is not feasible to read the name set with SparkContext.setAppName in yarn-cluster mode. Added additional notes to the arguments list of spark-submit.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@andrewor14
Copy link
Contributor

@ehnalis what happens if we set spark.app.name, which is passed to the AM process in cluster mode? There was some discussion on this before on #3557. cc @vanzin @tgravescs I wonder if we can fix this for both python and scala using the same approach.

@zzvara
Copy link
Contributor Author

zzvara commented Jun 7, 2015

@andrewor14 Currently the only place where you can set the name of your application is when you prepare an ApplicationSubmissionContext (YARN API) and you set the application name before submitting with YarnClient. This is what org.apache.spark.deploy.yarn.Client does. The problem is that the name of the Spark application has been set with SparkContext.setAppName usually in the user code (JAR) you wish to deploy to the cluster. When SparkContext wakes up on YARN as the AM process on container 0, your application already has an ID and a name, which you are not able to change as AM, neither as the client who submitted the application.

Check https://issues.apache.org/jira/browse/YARN-3772

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of mentioning YARN here since this parameter if for all modes. Also, I don't think the documentation should encourage the use of SparkContext.setAppName, exactly because it has different behaviour in client and cluster modes.

In summary, I think this comment should be left alone. If --name for some reason doesn't work in yarn-client mode, that should be fixed. If there are other documents that talk about SparkContext.setAppName, they should probably be changed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin There is currently no way to bring yarn-cluster and yarn-client modes under the same hood due to the reasons I've mentioned in the description of the pull-request. Do you think, that --name should override the name set by SparkContext.setAppName?

We can not be consistent without the removal of SparkContext.setAppName or without overriding it with --name if set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think, that --name should override the name set by SparkContext.setAppName?

No, but if you want to document that, this is the wrong place. I'm saying that we should discourage the use of SparkContext.setAppName.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin Okay. The problem is with yarn-client mode. When you have an application name with SparkContext.setAppName and you use the spark-submit shell script with --name as well, SparkContext.setAppName will be used. --name should have precedence then. Would it be okay, if we would change the code to use --name always - if supplied - and to encourage users at SparkContext.setAppName to use --name if possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ehnalis,

I don't really have a strong opinion about whether setAppName should override the command line or not, other than it's a change in semantics of how it currently works.

I can see how it may lead to slight confusion since cluster-mode apps will have slightly different behaviour, but the user can always do the same thing by just doing SparkConf.set("spark.app.name", "foo"), unless you add logic to SparkConf to not allow that key to be overridden once it's set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin SparkConf.set("spark.app.name", "foo") will not work in cluster-mode applications, since this line of code will run when the AM ignites the main user class. This setting will not take effect, since it will run on YARN, while the client already set the application name during application-submission. (And you are not able to change the app's name later on with YARN.)

I suggest to explicitly emphasize this behavior and limitation at setAppName, and I would encourage YARN users to use spark-submit --name while deploying. Maybe we should deprecate setAppName, because it can take no effect in cluster-mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparkConf.set("spark.app.name", "foo") will not work in cluster-mode applications

I know and that's not what I meant. I meant that even if we remove SparkConf.setAppName people can still set the app name by setting the conf directly, in client mode, and cause exactly the same discrepancy. So the only option t really have the command line version take over is to have the concept of a "final" config that cannot be overridden once it's set.

I don't know whether have something like that is worth it to fix this small issue. It's better just to discourage people from setting the app name programmatically.

@andrewor14
Copy link
Contributor

I see, because in cluster mode the app name could be set in the jar itself. I'm inclined to close this issue as a "Won't Fix" then. I agree with @vanzin that this doesn't look like the best place to document the differences in app name behavior. Maybe it's not even that big of a deal that it is divergent.

@zzvara
Copy link
Contributor Author

zzvara commented Jul 2, 2015

It would be nice to have at least some clarification on this matter in the docs. Or we might consider deprecating the {{SparkContext.setAppName}}, since it can not provide a solution for every deployment scenario.

@zzvara zzvara closed this Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants