-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18817][SparkR] set default spark-warehouse path to tempdir() #16247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
c855c2c
set default location of spark.sql.warehouse.dir
bdwyer2 79caf1c
fixed R style
bdwyer2 cee3944
added test case
bdwyer2 17b7bff
added SQL query to test
bdwyer2 84a110e
attempt to fix failing unit test
bdwyer2 9142397
remove hive support from unit test
bdwyer2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the JIRA, I think this change is not enough. If the user has a hive-site.xml and the warehouse dir set inside that, this change will override that [1]. We might need to change the Scala code and/or add a new option to specify that a new default for SparkR.
[1]
spark/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Line 49 in cf33a86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus this property
spark.sql.warehouse.dircould also be set in spark-defaults.conf which isn't known by this method at this point. By setting it here it would override any other possible values from spark config or hive site.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about an argument named
sparkWorkingDirectorydefault that totempdir()?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Shivaram is talking about a spark property, not a parameter (if perhaps your camel casing is an indication)
Basically, we don't want to change
spark.sql.warehouse.dirhere because it could be set already in an earlier point (just not accessible here)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the pb of circular dependency for getting the hive warehouse dir.
I think currently in sparkR, to create SparkSession, we need all those parameter and once spark session is there we can call
but the above logic requires that we have sparksession available but there is no spark session yet at this line of code. So there must a way to get the hive.conf directory from env.
we have to make decision that is it ok to get the env from system or not? something like the following
it will make sparkR dependent on xml parser . just some thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung I'm confused. By "spark property" do you mean something passed to
sparkR.session()via thesparkConfigargument?