-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1631] Correctly set the Yarn app name when launching the AM. #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Question: if I understand the code correctly, SPARK_YARN_APP_NAME will never be used (since I don't think it's possible to instantiate a SparkContext without spark.app.name being set in the corresponding SparkConf). If that's true, is there any point in keeping SPARK_YARN_APP_NAME around? If there is, I'd have to switch the conditions around (so that it overrides the value in SparkConf). |
|
Yeah, it looks like even if we set More generally, what semantics should we provide here? Should env vars always take precedence over the corresponding value in |
|
Personal opinion: the less cluster-specific options we have the better, so unless there's a very compelling reason to keep SPARK_YARN_APP_NAME supported, I'd just remove it and use the configured app name always. It seems to me, from the code, that it was just a workaround for the real bug anyway (the app name not being read from the config, but from the JVM system properties). |
|
I agree, we have 10 million env variables and spark config options, many of which are conflicting. It would be nice to cut down on the ones we don't actually use. (Though by that scale of exaggeration hadoop probably has order 1 billion configs) |
|
Pinging @tgravescs, since he seems to play around with Yarn a lot. |
|
Have you verified that it doesn't work correctly? My reading of the code is that setting SPARK_YARN_APP_NAME causes the spark.app.name system property to be set, which gets picked up in the SparkConf. |
|
Well, the current patch does not work correctly because with it SPARK_YARN_APP_NAME becomes useless. But before the patch my app name (set using the conf, not SPARK_YARN_APP_NAME) was never set in yarn, it was always "Spark". To set my app name in Yarn that way I'd have to use "-Dspark.app.name" in the JVM options, and not SparkConf.setAppName(). So the discussion is whether to keep SPARK_YARN_APP_NAME or get rid of it; since both are user-provided, I don't see a reason for keeping both, unless there's some other benefit that I'm not seeing. |
|
There were a bunch of things that kind of broke since the changes went in with pr299, especially around env variables with yarn. I thought @pwendell was going to file a jira to fix them all up at once. Note we are keeping the env variables for backwards compatibility but we should be moving towards using the spark-submit script with config options. |
|
I just inverted the condition in the latest patch, so that SPARK_YARN_APP_NAME overrides SparkConf if set. Tested both cases. |
|
what are you using to launch your job? can you please file a jira for this issue to link to this pr. |
|
I just tried setting SPARK_YARN_APP_NAME and its working fine for me without this patch with the yarn-client mode. I was just running the examples. Note that the intent of this env is to set the app name in the ResourceManager, not in spark itself. it was not originally equivalent to spark.app.name. I do agree that we should make them match up though. |
|
Hi Tom. SPARK_YARN_APP_NAME works fine. What doesn't work is when you do not set that env variable; in that case, the app name in the RM will always be "Spark", instead of the app name set in your app's SparkConf instance. Unless you explicitly set the app name using a JVM system property ("-Dspark.app.name=foo"). So:
|
|
Sorry somehow my browser wasn't showing a jira number and I missed the explanation there, thanks for adding an explanation here. It makes sense now. |
|
This case you describe wasn't originally supported on yarn. Perhaps it should have been fixed up when sparkConf was added but it looks like it wasn't so I would consider this more of an improvement rather then a bug.. minor nit. This patch only makes it work for the yarn-client mode, I would like to see it work in both if we are changing it. Would you be able to look into yarn-cluster mode also? The thing that I noticed while looking at this that I do consider a bug (but not what you reported in this pr) is that if I specify --name with spark-submit script with yarn-client mode it doesn't properly set the name on the RM or in spark. the yarn-cluster mode does work fine though. I will file a separate jira for that. |
|
I thought I looked at yarn-cluster mode and it was doing the right thing, but I'll look again. |
|
Ah, I see what you mean. I don't see a clean way to make "--name" work in client mode; SparkSubmit could call System.setProperty("spark.app.name"), but that (i) looks hacky and (ii) might be overriden by the application when it sets the app name in SparkConf. Maybe stash that value somewhere else and check it in the code my patch is touching? It's less hacky in that it would allow overriding the SparkConf value, but still kinda ugly. |
|
I dug through the code a bunch. For those following this PR, here's a quick summary. Looks like the way we set Spark app name on YARN is actually quite convoluted because there are two types of app names:
This PR is mainly concerned with (1) and only on the YARN client mode. Before the changes, there are two ways of affecting (1), in the order of decreasing priority.
After the changes, this becomes
This guarantees that (1) == (2) unless an environment variable is explicitly specified by the user. Further, the app name set in spark-defaults has in fact become the default. To me, these semantics are pretty clear. Notice however that in either before or after, there is now way for SparkSubmit's |
|
@vanzin Your proposal about having SparkSubmit calling As for this PR, the changes LGTM. |
|
I've merged this, thanks. |
Author: Marcelo Vanzin <[email protected]> Closes #539 from vanzin/yarn-app-name and squashes the following commits: 7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM. (cherry picked from commit 3f779d8) Signed-off-by: Patrick Wendell <[email protected]>
Author: Marcelo Vanzin <[email protected]> Closes apache#539 from vanzin/yarn-app-name and squashes the following commits: 7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
* [SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data (apache#531) * [SPARK-27514][SQL] Skip collapsing windows with empty window expressions (apache#538) * Bump hadoop to 2.9.2-palantir.5 (apache#537)
Fix kind base image entrypoint access rule
… functions (apache#539) ### What changes were proposed in this pull request? This is a regression between Spark 3.5.0 and Spark 4. The following queries work on Spark 3.5.0 while fails on latest master branch: ``` CREATE TABLE test_current_user(i int, s string) USING parquet; ALTER TABLE test_current_user ALTER COLUMN s SET DEFAULT current_user() ``` ``` CREATE TABLE test_current_user(i int, s string default current_user()) USING parquet INSERT INTO test_current_user (i) VALUES ((0)); ``` This PR is to complete fixing this by eagerly executing finish-analysis and constant-folding rules before checking whether the expression is foldable and resolved. ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UTs ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47538 from gengliangwang/pickFinishAnlysis. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> Co-authored-by: Gengliang Wang <[email protected]>
No description provided.