[SPARK-1631] Correctly set the Yarn app name when launching the AM. #539

vanzin · 2014-04-24T21:35:21Z

No description provided.

AmplabJenkins · 2014-04-24T21:37:56Z

Can one of the admins verify this patch?

vanzin · 2014-04-24T22:23:56Z

Question: if I understand the code correctly, SPARK_YARN_APP_NAME will never be used (since I don't think it's possible to instantiate a SparkContext without spark.app.name being set in the corresponding SparkConf).

If that's true, is there any point in keeping SPARK_YARN_APP_NAME around? If there is, I'd have to switch the conditions around (so that it overrides the value in SparkConf).

andrewor14 · 2014-04-25T00:56:24Z

Yeah, it looks like even if we set SPARK_YARN_APP_NAME it will never get propagated to the YARN client. I agree that switching around the two conditions will correct this, but how will the other configs be affected?

More generally, what semantics should we provide here? Should env vars always take precedence over the corresponding value in spark.* configuration options? @pwendell

vanzin · 2014-04-25T01:26:12Z

Personal opinion: the less cluster-specific options we have the better, so unless there's a very compelling reason to keep SPARK_YARN_APP_NAME supported, I'd just remove it and use the configured app name always. It seems to me, from the code, that it was just a workaround for the real bug anyway (the app name not being read from the config, but from the JVM system properties).

andrewor14 · 2014-04-25T03:54:44Z

I agree, we have 10 million env variables and spark config options, many of which are conflicting. It would be nice to cut down on the ones we don't actually use. (Though by that scale of exaggeration hadoop probably has order 1 billion configs)

vanzin · 2014-04-25T16:37:43Z

Pinging @tgravescs, since he seems to play around with Yarn a lot.

sryza · 2014-04-25T17:04:19Z

Have you verified that it doesn't work correctly? My reading of the code is that setting SPARK_YARN_APP_NAME causes the spark.app.name system property to be set, which gets picked up in the SparkConf.

vanzin · 2014-04-25T17:10:26Z

Well, the current patch does not work correctly because with it SPARK_YARN_APP_NAME becomes useless.

But before the patch my app name (set using the conf, not SPARK_YARN_APP_NAME) was never set in yarn, it was always "Spark". To set my app name in Yarn that way I'd have to use "-Dspark.app.name" in the JVM options, and not SparkConf.setAppName().

So the discussion is whether to keep SPARK_YARN_APP_NAME or get rid of it; since both are user-provided, I don't see a reason for keeping both, unless there's some other benefit that I'm not seeing.

tgravescs · 2014-04-25T17:16:48Z

There were a bunch of things that kind of broke since the changes went in with pr299, especially around env variables with yarn. I thought @pwendell was going to file a jira to fix them all up at once.

Note we are keeping the env variables for backwards compatibility but we should be moving towards using the spark-submit script with config options.

vanzin · 2014-04-25T17:31:13Z

I just inverted the condition in the latest patch, so that SPARK_YARN_APP_NAME overrides SparkConf if set. Tested both cases.

tgravescs · 2014-04-25T17:54:55Z

what are you using to launch your job? can you please file a jira for this issue to link to this pr.

tgravescs · 2014-04-29T16:24:15Z

I just tried setting SPARK_YARN_APP_NAME and its working fine for me without this patch with the yarn-client mode. I was just running the examples. Note that the intent of this env is to set the app name in the ResourceManager, not in spark itself. it was not originally equivalent to spark.app.name. I do agree that we should make them match up though.

vanzin · 2014-04-29T16:28:00Z

Hi Tom.

SPARK_YARN_APP_NAME works fine. What doesn't work is when you do not set that env variable; in that case, the app name in the RM will always be "Spark", instead of the app name set in your app's SparkConf instance. Unless you explicitly set the app name using a JVM system property ("-Dspark.app.name=foo").

So:

set SPARK_YARN_APP_NAME: works fine
pass "-Dspark.app.name=foo" in command line: works fine
do "new SparkConf().setAppName("foo")": does not work (RM app name is "Spark")

tgravescs · 2014-04-29T16:31:39Z

Sorry somehow my browser wasn't showing a jira number and I missed the explanation there, thanks for adding an explanation here. It makes sense now.

tgravescs · 2014-04-29T17:03:47Z

This case you describe wasn't originally supported on yarn. Perhaps it should have been fixed up when sparkConf was added but it looks like it wasn't so I would consider this more of an improvement rather then a bug.. minor nit. This patch only makes it work for the yarn-client mode, I would like to see it work in both if we are changing it. Would you be able to look into yarn-cluster mode also?

The thing that I noticed while looking at this that I do consider a bug (but not what you reported in this pr) is that if I specify --name with spark-submit script with yarn-client mode it doesn't properly set the name on the RM or in spark. the yarn-cluster mode does work fine though. I will file a separate jira for that.

vanzin · 2014-04-29T17:06:20Z

I thought I looked at yarn-cluster mode and it was doing the right thing, but I'll look again.

vanzin · 2014-04-30T23:42:49Z

Ah, I see what you mean. I don't see a clean way to make "--name" work in client mode; SparkSubmit could call System.setProperty("spark.app.name"), but that (i) looks hacky and (ii) might be overriden by the application when it sets the app name in SparkConf.

Maybe stash that value somewhere else and check it in the code my patch is touching? It's less hacky in that it would allow overriding the SparkConf value, but still kinda ugly.

andrewor14 · 2014-05-08T04:40:13Z

I dug through the code a bunch. For those following this PR, here's a quick summary. Looks like the way we set Spark app name on YARN is actually quite convoluted because there are two types of app names:

The one observed by the RM
The one observed by SparkContext

This PR is mainly concerned with (1) and only on the YARN client mode. Before the changes, there are two ways of affecting (1), in the order of decreasing priority.

set spark.app.name in spark-defaults
set SPARK_YARN_APP_NAME in spark-env

After the changes, this becomes

set SPARK_YARN_APP_NAME in spark-env
set spark.app.name in SparkConf (through conf.setAppName)
set spark.app.name in spark-defaults

This guarantees that (1) == (2) unless an environment variable is explicitly specified by the user. Further, the app name set in spark-defaults has in fact become the default. To me, these semantics are pretty clear.

Notice however that in either before or after, there is now way for SparkSubmit's --name parameter to affect either (1) or (2). This is a separate bug with SparkSubmit that I have summarized in SPARK-1755.

andrewor14 · 2014-05-08T04:52:00Z

@vanzin Your proposal about having SparkSubmit calling System.setProperty(spark.app.name) can be made clean if we just always convert --name to spark.app.name, which is already what the SparkSubmit usage currently suggests but does not fulfill. I will make this change in a separate PR that also addresses SPARK-1755.

As for this PR, the changes LGTM.

pwendell · 2014-05-09T03:46:34Z

I've merged this, thanks.

Author: Marcelo Vanzin <[email protected]> Closes #539 from vanzin/yarn-app-name and squashes the following commits: 7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM. (cherry picked from commit 3f779d8) Signed-off-by: Patrick Wendell <[email protected]>

Author: Marcelo Vanzin <[email protected]> Closes apache#539 from vanzin/yarn-app-name and squashes the following commits: 7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.

* [SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data (apache#531) * [SPARK-27514][SQL] Skip collapsing windows with empty window expressions (apache#538) * Bump hadoop to 2.9.2-palantir.5 (apache#537)

Fix kind base image entrypoint access rule

… functions (apache#539) ### What changes were proposed in this pull request? This is a regression between Spark 3.5.0 and Spark 4. The following queries work on Spark 3.5.0 while fails on latest master branch: ``` CREATE TABLE test_current_user(i int, s string) USING parquet; ALTER TABLE test_current_user ALTER COLUMN s SET DEFAULT current_user() ``` ``` CREATE TABLE test_current_user(i int, s string default current_user()) USING parquet INSERT INTO test_current_user (i) VALUES ((0)); ``` This PR is to complete fixing this by eagerly executing finish-analysis and constant-folding rules before checking whether the expression is foldable and resolved. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47538 from gengliangwang/pickFinishAnlysis. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> Co-authored-by: Gengliang Wang <[email protected]>

[SPARK-1631] Correctly set the Yarn app name when launching the AM.

7d1ca4f

vanzin changed the title ~~Correctly set the Yarn app name when launching the AM.~~ [SPARK-1631] Correctly set the Yarn app name when launching the AM. Apr 25, 2014

vanzin mentioned this pull request May 8, 2014

[SPARK-1755] Respect SparkSubmit --name on YARN #699

Closed

asfgit closed this in 3f779d8 May 9, 2014

tgravescs mentioned this pull request Aug 21, 2014

[SPARK-2872] Fix conflict between code and doc in YarnClientSchedulerBackend.scala #1684

Closed

andrewor14 mentioned this pull request Sep 11, 2014

[SPARK-3477] Clean up code in Yarn Client / ClientBase #2350

Closed

andrewor14 mentioned this pull request Dec 3, 2014

[SPARK-4697][YARN]System properties should override environment variables #3557

Closed

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Fix kind base image entrypoint access rule (apache#539)

195a3e0

Fix kind base image entrypoint access rule

[SPARK-1631] Correctly set the Yarn app name when launching the AM. #539

[SPARK-1631] Correctly set the Yarn app name when launching the AM. #539

Uh oh!

Conversation

vanzin commented Apr 24, 2014

Uh oh!

AmplabJenkins commented Apr 24, 2014

Uh oh!

vanzin commented Apr 24, 2014

Uh oh!

andrewor14 commented Apr 25, 2014

Uh oh!

vanzin commented Apr 25, 2014

Uh oh!

andrewor14 commented Apr 25, 2014

Uh oh!

vanzin commented Apr 25, 2014

Uh oh!

sryza commented Apr 25, 2014

Uh oh!

vanzin commented Apr 25, 2014

Uh oh!

tgravescs commented Apr 25, 2014

Uh oh!

vanzin commented Apr 25, 2014

Uh oh!

tgravescs commented Apr 25, 2014

Uh oh!

tgravescs commented Apr 29, 2014

Uh oh!

vanzin commented Apr 29, 2014

Uh oh!

tgravescs commented Apr 29, 2014

Uh oh!

tgravescs commented Apr 29, 2014

Uh oh!

vanzin commented Apr 29, 2014

Uh oh!

vanzin commented Apr 30, 2014

Uh oh!

andrewor14 commented May 8, 2014

Uh oh!

andrewor14 commented May 8, 2014

Uh oh!

pwendell commented May 9, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants