-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1508][SQL] Add SQLConf to SQLContext. #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
4ebf362
First cut at SQLConf inside SQLContext.
concretevitamin cb722c1
Finish up SQLConf patch.
concretevitamin ce22d80
Fix parsing issues.
concretevitamin 0ecea46
Changes for HiveQl and HiveContext.
concretevitamin d0c4578
Tmux typo.
concretevitamin 3b0c71b
Remove Parser for set commands. A few other fixes.
concretevitamin 2276929
Fix default hive result for set commands in HiveComparisonTest.
concretevitamin 41acd75
Add a test for hql() in HiveQuerySuite.
concretevitamin 0f00d86
Add a test for singleton set command in SQL.
concretevitamin c1017c2
WIP in changing SetCommand to take two Options (for different semanti…
concretevitamin efd82db
Clean up semantics of several cases of SET.
concretevitamin c651797
Add commands.scala.
concretevitamin 5b67985
New line at EOF.
concretevitamin 6983180
Move a SET test to SQLQuerySuite and make it complete.
concretevitamin b14b83e
In a HiveContext, make SQLConf a subset of HiveConf.
concretevitamin 13279e6
Refactor the logic of eagerly processing SET commands.
concretevitamin 41d7f09
Fix imports.
concretevitamin 2ea8cdc
Wrap long line.
concretevitamin c2067e8
Mark SQLContext transient and put it in a second param list.
concretevitamin 555599c
Bullet-proof (relatively) parsing SET per review comment.
concretevitamin d52e1bd
De-hardcode number of shuffle partitions for BasicOperators (read fro…
concretevitamin b766af9
Remove a test.
concretevitamin 1ce8a5e
Invoke runSqlHive() in SQLConf#get for the HiveContext case.
concretevitamin f8983d1
Minor changes per review comments.
concretevitamin 271f0b1
Minor change.
concretevitamin 88dd0c8
Remove redundant SET Keyword.
concretevitamin e9856c4
Use java.util.Collections.synchronizedMap on a Java HashMap.
concretevitamin 22d9ed7
Fix output() of Set physical. Add SQLConf param accessor method.
concretevitamin 5f7e6d8
Add default num partitions.
concretevitamin baa5d29
Remove default param for shuffle partitions accessor.
concretevitamin dd19666
Update a comment.
concretevitamin 26c40eb
Make SQLConf a trait and have SQLContext mix it in.
concretevitamin c129b86
Merge remote-tracking branch 'upstream/master' into sqlconf
concretevitamin d74dde5
Remove the redundant mkQueryExecution() method.
concretevitamin 4968c11
Very minor cleanup.
concretevitamin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78 changes: 78 additions & 0 deletions
78
sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql | ||
|
|
||
| import java.util.Properties | ||
|
|
||
| import scala.collection.JavaConverters._ | ||
|
|
||
| /** | ||
| * SQLConf holds mutable config parameters and hints. These can be set and | ||
| * queried either by passing SET commands into Spark SQL's DSL | ||
| * functions (sql(), hql(), etc.), or by programmatically using setters and | ||
| * getters of this class. This class is thread-safe. | ||
| */ | ||
| trait SQLConf { | ||
|
|
||
| /** Number of partitions to use for shuffle operators. */ | ||
| private[spark] def numShufflePartitions: Int = get("spark.sql.shuffle.partitions", "200").toInt | ||
|
|
||
| @transient | ||
| private val settings = java.util.Collections.synchronizedMap( | ||
| new java.util.HashMap[String, String]()) | ||
|
|
||
| def set(props: Properties): Unit = { | ||
| props.asScala.foreach { case (k, v) => this.settings.put(k, v) } | ||
| } | ||
|
|
||
| def set(key: String, value: String): Unit = { | ||
| require(key != null, "key cannot be null") | ||
| require(value != null, s"value cannot be null for ${key}") | ||
| settings.put(key, value) | ||
| } | ||
|
|
||
| def get(key: String): String = { | ||
| if (!settings.containsKey(key)) { | ||
| throw new NoSuchElementException(key) | ||
| } | ||
| settings.get(key) | ||
| } | ||
|
|
||
| def get(key: String, defaultValue: String): String = { | ||
| if (!settings.containsKey(key)) defaultValue else settings.get(key) | ||
| } | ||
|
|
||
| def getAll: Array[(String, String)] = settings.asScala.toArray | ||
|
|
||
| def getOption(key: String): Option[String] = { | ||
| if (!settings.containsKey(key)) None else Some(settings.get(key)) | ||
| } | ||
|
|
||
| def contains(key: String): Boolean = settings.containsKey(key) | ||
|
|
||
| def toDebugString: String = { | ||
| settings.synchronized { | ||
| settings.asScala.toArray.sorted.map{ case (k, v) => s"$k=$v" }.mkString("\n") | ||
| } | ||
| } | ||
|
|
||
| private[spark] def clear() { | ||
| settings.clear() | ||
| } | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused that the actual logic for the
setcommand is not handled inSetCommandPhysicalbut here as a special case. I think we can removeeagerlyProcessandprocessCmdin this file with the help of changes made in PR #948, so that we can move the actual processing logic ofsetcommand back toSetCommandPhysical. @rxin @marmbrus What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @liancheng -- so this function
eagerlyProcessis supposed to be triggered as soon ashql("set ...")is typed in / is being evaluated. The actual physical operatorSetCommandPhysicalhandles evaluating the actual contents of the SchemaRDD. These contents are not (and need not be, in the spirit of RDDs) eagerly processed at this particular place.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @concretevitamin, previously I thought the only important thing of a
SchemaRDDmade from an eagerly executed statement (a Hive native command, a set command, or some insertion command etc.) is its side effect, not the content of the RDD itself. But now I do agree that when user call.collect()of such an RDD, we should present some meaningful result.As for the eager evaluation of the RDD content, IMHO, for command
SchemaRDDs, the RDD contents are either empty or usually generated at the same time when the command is executed (at least I can't find a counter example in the current code base), thus this shouldn't be an issue.So I think the constraints presented here are:
SchemaRDDshould take place eagerly;SchemaRDDshould take place once and only once;.collect()method is called, something meaningful, usually the output message lines of the command, should be presented.Then how about adding a lazy field inside all the physical command nodes to wrap up the side effect and hold the command output? Take the
SetCommandPhysicalas an example:In this way, all the constraints are met:
toRddcall inSchemaRDDLike(PR [SPARK-1852] prevents queries with sorts submitting jobs prematurely #948),commandOutputfield,commandOutputand returned inexecute().An additional benefit is that, side effect logic of all the commands can be implemented within their own physical command nodes, instead of adding special cases inside
SQLContext.toRddand/orHiveContext.toRdd.Did I miss something here?
On the other hand, although I think this solution can be more concise and clean, it may involve some non-trivial changes, so I think we'd better merge this PR as is and make those changes in another PR when appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I merged this, but that sounds like a good plan to me.
Follow up JIRA: https://issues.apache.org/jira/browse/SPARK-2094
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liancheng That sounds like a very clear & good solution!