-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-33141][SQL] Capture SQL configs when creating permanent views #30289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #130757 has finished for PR 30289 at commit
|
| class Analyzer( | ||
| override val catalogManager: CatalogManager, | ||
| conf: SQLConf) | ||
| deprecatedConf: SQLConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to what this PR proposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is related to another sub-task of SPARK-33138 which makes internal classes of SparkSession always using active SQLConf. I opened a seperated PR (#30299) and removed this change here.
| assert(message.startsWith(s"Max iterations ($maxIterations) reached for batch Resolution, " + | ||
| s"please set '${SQLConf.ANALYZER_MAX_ITERATIONS.key}' to a larger value.")) | ||
| withSQLConf(SQLConf.ANALYZER_MAX_ITERATIONS.key -> maxIterations.toString) { | ||
| val conf = new SQLConf().copy(SQLConf.ANALYZER_MAX_ITERATIONS -> maxIterations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| } | ||
| } | ||
|
|
||
| test("SPARK-33141 view should be parsed and analyzed with configs set when creating") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SPARK-33141 -> SPARK-33141:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Could you describe more about the |
|
BTW, capturing configs into views may hit config migration issue. We might add/change/remove configs across Spark versions. |
What configs we should capture? As @maropu said, a view is basically logical query, is it important to capture SQL configs for it? From the description, seems it is for keeping semantically consistency. For a logical query, will we change its semantic by using different config values? If there is, it is behavior breaking config actually. For such configs, capturing a view's configs seems not making sense too. Can you describe the use case? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
can we copy-paste more context from JIRA tickets to the PR description? @luluorta |
|
Test build #130765 has finished for PR 30289 at commit
|
| */ | ||
| private def generateQuerySQLConfigs(conf: SQLConf): Map[String, String] = { | ||
| val modifiedConfs = conf.getAllConfs.filter { case (k, _) => | ||
| conf.isModifiable(k) && k != SQLConf.MAX_NESTED_VIEW_DEPTH.key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just save all configs (or all modifiable configs) and postpone the "ignore specific configs" until parse/analyze SQL text? It's easy to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we should keep the captured SQL configs as less as possible. It's hard to precisely captures the configs which only affect the parser and analyzer, an alternative way is just filtering out the configs that definitely can NOT affect parsing/analyzing.
e3f36cf to
6a41103
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #131534 has finished for PR 30289 at commit
|
| val VIEW_QUERY_OUTPUT_NUM_COLUMNS = VIEW_QUERY_OUTPUT_PREFIX + "numCols" | ||
| val VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX = VIEW_QUERY_OUTPUT_PREFIX + "col." | ||
|
|
||
| val VIEW_QUERY_SQL_CONFIGS = VIEW_PREFIX + "query.sqlConfigs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to current catalog/namespace, which is about the context, not query. Can we define it close to VIEW_CATALOG_AND_NAMESPACE and follow it's property key naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| .createWithDefault(100) | ||
|
|
||
| val APPLY_VIEW_SQL_CONFIGS = | ||
| buildConf("spark.sql.legacy.view.applySQLConfigs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legacy config name should describe the legacy behavior. How about spark.sql.legacy.useCurrentConfigsForView?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| object ViewHelper { | ||
|
|
||
| private val configPrefixBlacklist = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: configPrefixDenyList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| "spark.sql.shuffle.", | ||
| "spark.sql.adaptive.") | ||
|
|
||
| private def isConfigBlacklisted(key: String): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
def shouldCaptureConfig(key: String): Boolean = {
!configPrefixDenyList.exists(prefix => key.startsWith(prefix))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // for createViewCommand queryOutput may be different from fieldNames | ||
| val queryOutput = analyzedPlan.schema.fieldNames | ||
|
|
||
| val conf = SQLConf.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have session passed in, seems better to use session.sessionState.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| /** | ||
| * Convert the view query SQL configs in `properties`. | ||
| */ | ||
| private def generateQuerySQLConfigs(conf: SQLConf): Map[String, String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlConfigsToProps, following catalogAndNamespaceToProps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
|
|
||
| test("SPARK-33141: view should be parsed and analyzed with configs set when creating") { | ||
| withTable("t33141") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's hard to read, let's just use t, v1, v2, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| val props = new mutable.HashMap[String, String] | ||
| if (modifiedConfs.nonEmpty) { | ||
| val confJson = compact(render(JsonProtocol.mapToJson(modifiedConfs))) | ||
| props.put(VIEW_QUERY_SQL_CONFIGS, confJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you stress-tested this? The hive metastore has a limitation about property value length. You can take a look at HiveExternalCatalog.tableMetaToTableProps.
Another idea is to put one config per table property entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Splitting a large value string into small chunks seems a hive specific solution, so I changed to store one config per table property entry, each with a "view.sqlConfig." prefix.
|
Test build #131603 has finished for PR 30289 at commit
|
I added more background information to the PR description. |
|
Test build #131746 has finished for PR 30289 at commit
|
|
My two cents:
|
|
Test build #131749 has finished for PR 30289 at commit
|
|
thanks, merging to master! |
…ysisContext ### What changes were proposed in this pull request? This is a followup of #30289. It removes the hack in `View.effectiveSQLConf`, by putting the max nested view depth in `AnalysisContext`. Then we don't get the max nested view depth from the active SQLConf, which keeps changing during nested view resolution. ### Why are the changes needed? remove hacks. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? If I just remove the hack, `SimpleSQLViewSuite.restrict the nested level of a view` fails. With this fix, it passes again. Closes #30575 from cloud-fan/view. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ysisContext ### What changes were proposed in this pull request? This is a followup of #30289. It removes the hack in `View.effectiveSQLConf`, by putting the max nested view depth in `AnalysisContext`. Then we don't get the max nested view depth from the active SQLConf, which keeps changing during nested view resolution. ### Why are the changes needed? remove hacks. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? If I just remove the hack, `SimpleSQLViewSuite.restrict the nested level of a view` fails. With this fix, it passes again. Closes #30575 from cloud-fan/view. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit acc211d) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR makes CreateViewCommand/AlterViewAsCommand capturing runtime SQL configs and store them as view properties. These configs will be applied during the parsing and analysis phases of the view resolution. Users can set
spark.sql.legacy.useCurrentConfigsForViewtotrueto restore the behavior before.Why are the changes needed?
This PR is a sub-task of SPARK-33138 that proposes to unify temp view and permanent view behaviors. This PR makes permanent views mimicking the temp view behavior that "fixes" view semantic by directly storing resolved LogicalPlan. For example, if a user uses spark 2.4 to create a view that contains null values from division-by-zero expressions, she may not want that other users' queries which reference her view throw exceptions when running on spark 3.x with ansi mode on.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
added UT + existing UTs (improved)