-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7037] [CORE] Inconsistent behavior for non-spark config properties in spark-shell and spark-submit #5617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bleh, can this be done without creating a subclass of HashMap? just a function that controls put-ting values in this map? CC @sryza on this one. I agree with the goal of making this consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Sean. The behavior you're introducing that always ignores non-spark properties seems right to me, but extending HashMap here seems a little weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this logic in two places if the props are later removed? Maybe there's a reason, just checking if it can be avoided. This looks better to me. @sryza will know the logic better than I.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I moved the logic of removing non-spark properties out of this loop is because properties that are loaded from the command line (--conf) cannot be filtered here. So I decided to remove non-spark properties after properties from the config file and the command line are merged.
Please let me know if you have a better suggestion.
|
Got it, that makes sense. So this will in all cases ignore, but warn, about non-Spark properties, for spark-submit and spark-shell alike. OK, if that's the intent, this LGTM |
|
ok to test |
|
Test build #30877 has finished for PR 5617 at commit
|
|
Thank you for merging it! |
…ties in spark-shell and spark-submit When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users. Here is the summary- * spark-submit * --conf k=v => silently ignored * spark-defaults.conf => applied * spark-shell * --conf k=v => show a warning message and ignored * spark-defaults.conf => show a warning message and ignored I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases. Author: Cheolsoo Park <[email protected]> Closes apache#5617 from piaozhexiu/SPARK-7037 and squashes the following commits: 8957950 [Cheolsoo Park] Add IgnoreNonSparkProperties method fedd01c [Cheolsoo Park] Ignore non-spark properties with a warning message in all cases
…ties in spark-shell and spark-submit When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users. Here is the summary- * spark-submit * --conf k=v => silently ignored * spark-defaults.conf => applied * spark-shell * --conf k=v => show a warning message and ignored * spark-defaults.conf => show a warning message and ignored I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases. Author: Cheolsoo Park <[email protected]> Closes apache#5617 from piaozhexiu/SPARK-7037 and squashes the following commits: 8957950 [Cheolsoo Park] Add IgnoreNonSparkProperties method fedd01c [Cheolsoo Park] Ignore non-spark properties with a warning message in all cases
|
Yes, I think that's a good idea. We can redact them too, but no big reason
to log the value anyway. I'll raise a PR.
…On Fri, Dec 13, 2019 at 11:32 PM Aaron Steers ***@***.***> wrote:
Is it expected that this error would print aws security keys to log files?
Seems like a serious security concern.
Warning: Ignoring non-spark config property: fs.s3a.access.key={full-access-key}
Warning: Ignoring non-spark config property: fs.s3a.secret.key={full-secret-key}
Could we not accomplish the same thing by printing the name of the key
*without* the key's value?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5617?email_source=notifications&email_token=AAGIZ6TSSMBI3I2IVQIRIPDQYRVYTA5CNFSM4BAVX3R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG325OI#issuecomment-565685945>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGIZ6XOSHGGHWQMRYTOEJLQYRVYTANCNFSM4BAVX3RQ>
.
|
|
I'm not entirely sure if this is related to this PR, but I am getting warnings with non-spark configuration. I'm adding the configuration as And with those, I get warnings as follows: How to set non-spark config in pyspark? I get the same warning when I try setting the configuration from commandline to spark-shell with |
|
(This is a very old pull request, so the logic has been around a long time; the mailing list is a better place to ask a general question) These look like Hadoop configs? I think you want to set them either in your Hadoop config XML files, or using the |
|
Not hadoop configurations. It's Oracle's oci object store File system. I'm trying to setup the spark connector for that file system scheme. |
|
Yeah, that sounds like something to set in the Hadoop conf though. See above |
|
No. I don't think this is related to Hadoop. The same configuration works with scala spark. Seems to be a problem only with pyspark and spark shell. |
|
If it is an FS you are accessing via Spark, you are using Hadoop APIs. |
When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users.
Here is the summary-
I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases.