Skip to content

Conversation

@piaozhexiu
Copy link

When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users.
Here is the summary-

  • spark-submit
    • --conf k=v => silently ignored
    • spark-defaults.conf => applied
  • spark-shell
    • --conf k=v => show a warning message and ignored
    • spark-defaults.conf => show a warning message and ignored

I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh, can this be done without creating a subclass of HashMap? just a function that controls put-ting values in this map? CC @sryza on this one. I agree with the goal of making this consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Sean. The behavior you're introducing that always ignores non-spark properties seems right to me, but extending HashMap here seems a little weird.

@piaozhexiu
Copy link
Author

Thank you for your comments @srowen @sryza .

I incorporated your suggestion. Does this look better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this logic in two places if the props are later removed? Maybe there's a reason, just checking if it can be avoided. This looks better to me. @sryza will know the logic better than I.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I moved the logic of removing non-spark properties out of this loop is because properties that are loaded from the command line (--conf) cannot be filtered here. So I decided to remove non-spark properties after properties from the config file and the command line are merged.

Please let me know if you have a better suggestion.

@srowen
Copy link
Member

srowen commented Apr 22, 2015

Got it, that makes sense. So this will in all cases ignore, but warn, about non-Spark properties, for spark-submit and spark-shell alike. OK, if that's the intent, this LGTM

@srowen
Copy link
Member

srowen commented Apr 23, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30877 has finished for PR 5617 at commit 8957950.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@asfgit asfgit closed this in 336f7f5 Apr 24, 2015
@piaozhexiu
Copy link
Author

Thank you for merging it!

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 14, 2015
…ties in spark-shell and spark-submit

When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users.
Here is the summary-
* spark-submit
  * --conf k=v => silently ignored
  * spark-defaults.conf => applied
* spark-shell
  * --conf k=v => show a warning message and ignored
  *  spark-defaults.conf => show a warning message and ignored

I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases.

Author: Cheolsoo Park <[email protected]>

Closes apache#5617 from piaozhexiu/SPARK-7037 and squashes the following commits:

8957950 [Cheolsoo Park] Add IgnoreNonSparkProperties method
fedd01c [Cheolsoo Park] Ignore non-spark properties with a warning message in all cases
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…ties in spark-shell and spark-submit

When specifying non-spark properties (i.e. names don't start with spark.) in the command line and config file, spark-submit and spark-shell behave differently, causing confusion to users.
Here is the summary-
* spark-submit
  * --conf k=v => silently ignored
  * spark-defaults.conf => applied
* spark-shell
  * --conf k=v => show a warning message and ignored
  *  spark-defaults.conf => show a warning message and ignored

I assume that ignoring non-spark properties is intentional. If so, it should always be ignored with a warning message in all cases.

Author: Cheolsoo Park <[email protected]>

Closes apache#5617 from piaozhexiu/SPARK-7037 and squashes the following commits:

8957950 [Cheolsoo Park] Add IgnoreNonSparkProperties method
fedd01c [Cheolsoo Park] Ignore non-spark properties with a warning message in all cases
@srowen
Copy link
Member

srowen commented Dec 14, 2019 via email

@johnnydepup
Copy link

johnnydepup commented Jan 6, 2020

I'm not entirely sure if this is related to this PR, but I am getting warnings with non-spark configuration.
I'm using pyspark 2.4.3 with spark 2.4.3 and I'm trying to add configuration for a file system scheme.

I'm adding the configuration as

SparkSession.builder  \
                   .appName(..) \
                   .master(..) \
                   .config('fs.oci.client.hostname', '..')  \
                   .config('fs.oci.client.auth.pemfilepath'. '..') \
                   ..

And with those, I get warnings as follows:

Warning: Ignoring non-spark config property: fs.oci.client.hostname=....
Warning: Ignoring non-spark config property: fs.oci.client.auth.fingerprint=....
Warning: Ignoring non-spark config property: fs.oci.client.auth.tenantId=....
Warning: Ignoring non-spark config property: fs.oci.client.auth.pemfilepath=....

How to set non-spark config in pyspark?

I get the same warning when I try setting the configuration from commandline to spark-shell with

spark-shell --conf ... --conf ...

@srowen
Copy link
Member

srowen commented Jan 6, 2020

(This is a very old pull request, so the logic has been around a long time; the mailing list is a better place to ask a general question) These look like Hadoop configs? I think you want to set them either in your Hadoop config XML files, or using the spark.hadoop. prefix

@johnnydepup
Copy link

Not hadoop configurations. It's Oracle's oci object store File system. I'm trying to setup the spark connector for that file system scheme.

@srowen
Copy link
Member

srowen commented Jan 7, 2020

Yeah, that sounds like something to set in the Hadoop conf though. See above

@johnnydepup
Copy link

No. I don't think this is related to Hadoop. The same configuration works with scala spark. Seems to be a problem only with pyspark and spark shell.
I'll test and see if adding the spark.hadoop prefix does anything.

@srowen
Copy link
Member

srowen commented Jan 7, 2020

If it is an FS you are accessing via Spark, you are using Hadoop APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants