Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Apr 24, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@vanzin
Copy link
Contributor Author

vanzin commented Apr 24, 2014

Question: if I understand the code correctly, SPARK_YARN_APP_NAME will never be used (since I don't think it's possible to instantiate a SparkContext without spark.app.name being set in the corresponding SparkConf).

If that's true, is there any point in keeping SPARK_YARN_APP_NAME around? If there is, I'd have to switch the conditions around (so that it overrides the value in SparkConf).

@andrewor14
Copy link
Contributor

Yeah, it looks like even if we set SPARK_YARN_APP_NAME it will never get propagated to the YARN client. I agree that switching around the two conditions will correct this, but how will the other configs be affected?

More generally, what semantics should we provide here? Should env vars always take precedence over the corresponding value in spark.* configuration options? @pwendell

@vanzin
Copy link
Contributor Author

vanzin commented Apr 25, 2014

Personal opinion: the less cluster-specific options we have the better, so unless there's a very compelling reason to keep SPARK_YARN_APP_NAME supported, I'd just remove it and use the configured app name always. It seems to me, from the code, that it was just a workaround for the real bug anyway (the app name not being read from the config, but from the JVM system properties).

@andrewor14
Copy link
Contributor

I agree, we have 10 million env variables and spark config options, many of which are conflicting. It would be nice to cut down on the ones we don't actually use. (Though by that scale of exaggeration hadoop probably has order 1 billion configs)

@vanzin
Copy link
Contributor Author

vanzin commented Apr 25, 2014

Pinging @tgravescs, since he seems to play around with Yarn a lot.

@sryza
Copy link
Contributor

sryza commented Apr 25, 2014

Have you verified that it doesn't work correctly? My reading of the code is that setting SPARK_YARN_APP_NAME causes the spark.app.name system property to be set, which gets picked up in the SparkConf.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 25, 2014

Well, the current patch does not work correctly because with it SPARK_YARN_APP_NAME becomes useless.

But before the patch my app name (set using the conf, not SPARK_YARN_APP_NAME) was never set in yarn, it was always "Spark". To set my app name in Yarn that way I'd have to use "-Dspark.app.name" in the JVM options, and not SparkConf.setAppName().

So the discussion is whether to keep SPARK_YARN_APP_NAME or get rid of it; since both are user-provided, I don't see a reason for keeping both, unless there's some other benefit that I'm not seeing.

@tgravescs
Copy link
Contributor

There were a bunch of things that kind of broke since the changes went in with pr299, especially around env variables with yarn. I thought @pwendell was going to file a jira to fix them all up at once.

Note we are keeping the env variables for backwards compatibility but we should be moving towards using the spark-submit script with config options.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 25, 2014

I just inverted the condition in the latest patch, so that SPARK_YARN_APP_NAME overrides SparkConf if set. Tested both cases.

@tgravescs
Copy link
Contributor

what are you using to launch your job? can you please file a jira for this issue to link to this pr.

@vanzin vanzin changed the title Correctly set the Yarn app name when launching the AM. [SPARK-1631] Correctly set the Yarn app name when launching the AM. Apr 25, 2014
@tgravescs
Copy link
Contributor

I just tried setting SPARK_YARN_APP_NAME and its working fine for me without this patch with the yarn-client mode. I was just running the examples. Note that the intent of this env is to set the app name in the ResourceManager, not in spark itself. it was not originally equivalent to spark.app.name. I do agree that we should make them match up though.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 29, 2014

Hi Tom.

SPARK_YARN_APP_NAME works fine. What doesn't work is when you do not set that env variable; in that case, the app name in the RM will always be "Spark", instead of the app name set in your app's SparkConf instance. Unless you explicitly set the app name using a JVM system property ("-Dspark.app.name=foo").

So:

  • set SPARK_YARN_APP_NAME: works fine
  • pass "-Dspark.app.name=foo" in command line: works fine
  • do "new SparkConf().setAppName("foo")": does not work (RM app name is "Spark")

@tgravescs
Copy link
Contributor

Sorry somehow my browser wasn't showing a jira number and I missed the explanation there, thanks for adding an explanation here. It makes sense now.

@tgravescs
Copy link
Contributor

This case you describe wasn't originally supported on yarn. Perhaps it should have been fixed up when sparkConf was added but it looks like it wasn't so I would consider this more of an improvement rather then a bug.. minor nit. This patch only makes it work for the yarn-client mode, I would like to see it work in both if we are changing it. Would you be able to look into yarn-cluster mode also?

The thing that I noticed while looking at this that I do consider a bug (but not what you reported in this pr) is that if I specify --name with spark-submit script with yarn-client mode it doesn't properly set the name on the RM or in spark. the yarn-cluster mode does work fine though. I will file a separate jira for that.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 29, 2014

I thought I looked at yarn-cluster mode and it was doing the right thing, but I'll look again.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 30, 2014

Ah, I see what you mean. I don't see a clean way to make "--name" work in client mode; SparkSubmit could call System.setProperty("spark.app.name"), but that (i) looks hacky and (ii) might be overriden by the application when it sets the app name in SparkConf.

Maybe stash that value somewhere else and check it in the code my patch is touching? It's less hacky in that it would allow overriding the SparkConf value, but still kinda ugly.

@andrewor14
Copy link
Contributor

I dug through the code a bunch. For those following this PR, here's a quick summary. Looks like the way we set Spark app name on YARN is actually quite convoluted because there are two types of app names:

  1. The one observed by the RM
  2. The one observed by SparkContext

This PR is mainly concerned with (1) and only on the YARN client mode. Before the changes, there are two ways of affecting (1), in the order of decreasing priority.

  • set spark.app.name in spark-defaults
  • set SPARK_YARN_APP_NAME in spark-env

After the changes, this becomes

  • set SPARK_YARN_APP_NAME in spark-env
  • set spark.app.name in SparkConf (through conf.setAppName)
  • set spark.app.name in spark-defaults

This guarantees that (1) == (2) unless an environment variable is explicitly specified by the user. Further, the app name set in spark-defaults has in fact become the default. To me, these semantics are pretty clear.

Notice however that in either before or after, there is now way for SparkSubmit's --name parameter to affect either (1) or (2). This is a separate bug with SparkSubmit that I have summarized in SPARK-1755.

@andrewor14
Copy link
Contributor

@vanzin Your proposal about having SparkSubmit calling System.setProperty(spark.app.name) can be made clean if we just always convert --name to spark.app.name, which is already what the SparkSubmit usage currently suggests but does not fulfill. I will make this change in a separate PR that also addresses SPARK-1755.

As for this PR, the changes LGTM.

@pwendell
Copy link
Contributor

pwendell commented May 9, 2014

I've merged this, thanks.

asfgit pushed a commit that referenced this pull request May 9, 2014
Author: Marcelo Vanzin <[email protected]>

Closes #539 from vanzin/yarn-app-name and squashes the following commits:

7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
(cherry picked from commit 3f779d8)

Signed-off-by: Patrick Wendell <[email protected]>
@asfgit asfgit closed this in 3f779d8 May 9, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
Author: Marcelo Vanzin <[email protected]>

Closes apache#539 from vanzin/yarn-app-name and squashes the following commits:

7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
helenyugithub pushed a commit to helenyugithub/spark that referenced this pull request Aug 20, 2019
* [SPARK-27267][CORE] Update snappy to avoid error when decompressing empty serialized data (apache#531)
* [SPARK-27514][SQL] Skip collapsing windows with empty window expressions (apache#538)
* Bump hadoop to 2.9.2-palantir.5 (apache#537)
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Fix kind base image entrypoint access rule
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
… functions (apache#539)

### What changes were proposed in this pull request?

This is a regression between Spark 3.5.0 and Spark 4.
The following queries work on Spark 3.5.0 while fails on latest master branch:
```
CREATE TABLE test_current_user(i int, s string) USING parquet;
ALTER TABLE test_current_user ALTER COLUMN s SET DEFAULT current_user()
```
```
CREATE TABLE test_current_user(i int, s string default current_user()) USING parquet
INSERT INTO test_current_user (i) VALUES ((0));
```

This PR is to complete fixing this by eagerly executing finish-analysis and constant-folding rules before checking whether the expression is foldable and resolved.
### Why are the changes needed?

Bug fix

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

New UTs
### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47538 from gengliangwang/pickFinishAnlysis.

Authored-by: Gengliang Wang <[email protected]>

Signed-off-by: Gengliang Wang <[email protected]>
Co-authored-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants