-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf #30299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #130790 has finished for PR 30299 at commit
|
90181f9 to
f33d4be
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130821 has finished for PR 30299 at commit
|
|
Test build #130824 has finished for PR 30299 at commit
|
|
Test build #130857 has finished for PR 30299 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| /** | ||
| * Trait for shared SQLConf. | ||
| */ | ||
| trait HasConf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is weird. How about SQLConfHelper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, SQLConfHelper is better.
| import org.apache.spark.sql.internal.SQLConf | ||
|
|
||
| /** | ||
| * Trait for shared SQLConf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trait for getting the active SQLConf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| import org.apache.spark.sql.connector.catalog._ | ||
| import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ | ||
| import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, After, ColumnChange, ColumnPosition, DeleteColumn, RenameColumn, UpdateColumnComment, UpdateColumnNullability, UpdateColumnPosition, UpdateColumnType} | ||
| import org.apache.spark.sql.connector.catalog.TableChange.{First => _, _} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
| externalCatalog: ExternalCatalog, | ||
| functionRegistry: FunctionRegistry, | ||
| conf: SQLConf) = { | ||
| staticConf: SQLConf) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous name conf is OK. SQLConf contains both static and runtime configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
| val expected = caseInsensitiveAnalyzer.execute( | ||
| testRelation.where('a > 2 && ('b > 3 || 'b < 5))) | ||
| comparePlans(actual, expected) | ||
| withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default value. Seems we can remove withSQLConf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
| val expected = caseInsensitiveAnalyzer.execute( | ||
| testRelation.where('a > 2 || ('b > 3 && 'b < 5))) | ||
| comparePlans(actual, expected) | ||
| withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
| select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE'); | ||
| select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE'); | ||
| select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | ||
| select from_json('{"timestamp":"26/October/2015"}', 'timestamp Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we change this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR, dynamically set "spark.sql.ansi.enabled" actually takes effect in parsing phase. This query will fails parsing cause time is a reserved key word of SQL standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp is also reserved in ANSI standard. How about ts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| require(partitionPruningPred.isEmpty || relation.isPartitioned, | ||
| "Partition pruning predicates only supported for partitioned tables.") | ||
|
|
||
| override def conf: SQLConf = sparkSession.sessionState.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually makes sense, to make sure the conf matches the spark session. How about we move this override to SparkPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
|
Test build #130913 has finished for PR 30299 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #130929 has finished for PR 30299 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy')); | ||
| select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); | ||
| select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/MMMMM/yyyy')); | ||
| select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which change caused this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a bug before that the parser used in from_json to parse the schema string sticks to the configs of the first created session in the current thread. Now the parser always use the active conf, and ANSI test fails here because time is a reserved keyword.
We probably need a separate PR for this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a new PR for this issue #30357
|
Test build #130940 has finished for PR 30299 at commit
|
6dbd559 to
bf1e56a
Compare
| prunePartitions(hivePartitions) | ||
| } | ||
| } else { | ||
| if (sparkSession.sessionState.conf.metastorePartitionPruning && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep this unchanged for now. We may override def conf in SparkPlan later, to always get conf from the captured spark session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
|
Test build #131139 has finished for PR 30299 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
0cc0b42 to
99619e3
Compare
|
Test build #131141 has finished for PR 30299 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
GA passed, merging to master, thanks! |
|
Test build #131159 has finished for PR 30299 at commit
|
What changes were proposed in this pull request?
This PR makes internal classes of SparkSession always using active SQLConf. We should remove all
conf: SQLConfs from ctor-parameters of this classes (Analyzer,SparkPlanner,SessionCatalog,CatalogManagerSparkSqlParserand etc.) and useSQLConf.getinstead.Why are the changes needed?
Code refine.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing test