[SPARK-7037] [CORE] Inconsistent behavior for non-spark config properties in spark-shell and spark-submit #5617

piaozhexiu · 2015-04-21T19:18:07Z

When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users.
Here is the summary-

spark-submit
- --conf k=v => silently ignored
- spark-defaults.conf => applied
spark-shell
- --conf k=v => show a warning message and ignored
- spark-defaults.conf => show a warning message and ignored

I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases.

srowen · 2015-04-22T00:28:41Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

Bleh, can this be done without creating a subclass of HashMap? just a function that controls put-ting values in this map? CC @sryza on this one. I agree with the goal of making this consistent.

Agree with Sean. The behavior you're introducing that always ignores non-spark properties seems right to me, but extending HashMap here seems a little weird.

piaozhexiu · 2015-04-22T04:28:00Z

Thank you for your comments @srowen @sryza .

I incorporated your suggestion. Does this look better?

srowen · 2015-04-22T10:32:40Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

Do you need this logic in two places if the props are later removed? Maybe there's a reason, just checking if it can be avoided. This looks better to me. @sryza will know the logic better than I.

The reason why I moved the logic of removing non-spark properties out of this loop is because properties that are loaded from the command line (--conf) cannot be filtered here. So I decided to remove non-spark properties after properties from the config file and the command line are merged.

Please let me know if you have a better suggestion.

srowen · 2015-04-22T15:28:38Z

Got it, that makes sense. So this will in all cases ignore, but warn, about non-Spark properties, for spark-submit and spark-shell alike. OK, if that's the intent, this LGTM

srowen · 2015-04-23T21:22:17Z

ok to test

SparkQA · 2015-04-23T23:01:08Z

Test build #30877 has finished for PR 5617 at commit 8957950.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

piaozhexiu · 2015-04-24T01:40:53Z

Thank you for merging it!

…ties in spark-shell and spark-submit When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users. Here is the summary- * spark-submit * --conf k=v => silently ignored * spark-defaults.conf => applied * spark-shell * --conf k=v => show a warning message and ignored * spark-defaults.conf => show a warning message and ignored I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases. Author: Cheolsoo Park <[email protected]> Closes apache#5617 from piaozhexiu/SPARK-7037 and squashes the following commits: 8957950 [Cheolsoo Park] Add IgnoreNonSparkProperties method fedd01c [Cheolsoo Park] Ignore non-spark properties with a warning message in all cases

srowen · 2019-12-14T13:42:20Z

Yes, I think that's a good idea. We can redact them too, but no big reason to log the value anyway. I'll raise a PR.

…

On Fri, Dec 13, 2019 at 11:32 PM Aaron Steers ***@***.***> wrote: Is it expected that this error would print aws security keys to log files? Seems like a serious security concern. Warning: Ignoring non-spark config property: fs.s3a.access.key={full-access-key} Warning: Ignoring non-spark config property: fs.s3a.secret.key={full-secret-key} Could we not accomplish the same thing by printing the name of the key *without* the key's value? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5617?email_source=notifications&email_token=AAGIZ6TSSMBI3I2IVQIRIPDQYRVYTA5CNFSM4BAVX3R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG325OI#issuecomment-565685945>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGIZ6XOSHGGHWQMRYTOEJLQYRVYTANCNFSM4BAVX3RQ> .

johnnydepup · 2020-01-06T13:59:20Z

I'm not entirely sure if this is related to this PR, but I am getting warnings with non-spark configuration.
I'm using pyspark 2.4.3 with spark 2.4.3 and I'm trying to add configuration for a file system scheme.

I'm adding the configuration as

SparkSession.builder  \
                   .appName(..) \
                   .master(..) \
                   .config('fs.oci.client.hostname', '..')  \
                   .config('fs.oci.client.auth.pemfilepath'. '..') \
                   ..

And with those, I get warnings as follows:

Warning: Ignoring non-spark config property: fs.oci.client.hostname=....
Warning: Ignoring non-spark config property: fs.oci.client.auth.fingerprint=....
Warning: Ignoring non-spark config property: fs.oci.client.auth.tenantId=....
Warning: Ignoring non-spark config property: fs.oci.client.auth.pemfilepath=....

How to set non-spark config in pyspark?

I get the same warning when I try setting the configuration from commandline to spark-shell with

spark-shell --conf ... --conf ...

srowen · 2020-01-06T15:14:29Z

(This is a very old pull request, so the logic has been around a long time; the mailing list is a better place to ask a general question) These look like Hadoop configs? I think you want to set them either in your Hadoop config XML files, or using the spark.hadoop. prefix

johnnydepup · 2020-01-07T03:14:02Z

Not hadoop configurations. It's Oracle's oci object store File system. I'm trying to setup the spark connector for that file system scheme.

srowen · 2020-01-07T03:15:02Z

Yeah, that sounds like something to set in the Hadoop conf though. See above

johnnydepup · 2020-01-07T03:18:15Z

No. I don't think this is related to Hadoop. The same configuration works with scala spark. Seems to be a problem only with pyspark and spark shell.
I'll test and see if adding the spark.hadoop prefix does anything.

srowen · 2020-01-07T03:39:12Z

If it is an FS you are accessing via Spark, you are using Hadoop APIs.

srowen reviewed Apr 22, 2015
View reviewed changes

Cheolsoo Park added 2 commits April 21, 2015 21:25

Ignore non-spark properties with a warning message in all cases

fedd01c

Add IgnoreNonSparkProperties method

8957950

piaozhexiu force-pushed the SPARK-7037 branch from bf991f1 to 8957950 Compare April 22, 2015 04:26

srowen reviewed Apr 22, 2015
View reviewed changes

asfgit closed this in 336f7f5 Apr 24, 2015

[SPARK-7037] [CORE] Inconsistent behavior for non-spark config properties in spark-shell and spark-submit #5617

[SPARK-7037] [CORE] Inconsistent behavior for non-spark config properties in spark-shell and spark-submit #5617

Uh oh!

Conversation

piaozhexiu commented Apr 21, 2015

Uh oh!

srowen Apr 22, 2015

Choose a reason for hiding this comment

Uh oh!

sryza Apr 22, 2015

Choose a reason for hiding this comment

Uh oh!

piaozhexiu commented Apr 22, 2015

Uh oh!

srowen Apr 22, 2015

Choose a reason for hiding this comment

Uh oh!

piaozhexiu Apr 22, 2015

Choose a reason for hiding this comment

Uh oh!

srowen commented Apr 22, 2015

Uh oh!

srowen commented Apr 23, 2015

Uh oh!

SparkQA commented Apr 23, 2015

Uh oh!

piaozhexiu commented Apr 24, 2015

Uh oh!

srowen commented Dec 14, 2019 via email

Uh oh!

johnnydepup commented Jan 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Jan 6, 2020

Uh oh!

johnnydepup commented Jan 7, 2020

Uh oh!

srowen commented Jan 7, 2020

Uh oh!

johnnydepup commented Jan 7, 2020

Uh oh!

srowen commented Jan 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

johnnydepup commented Jan 6, 2020 •

edited

Loading