Skip to content

Conversation

@luluorta
Copy link
Contributor

@luluorta luluorta commented Nov 8, 2020

What changes were proposed in this pull request?

This PR makes CreateViewCommand/AlterViewAsCommand capturing runtime SQL configs and store them as view properties. These configs will be applied during the parsing and analysis phases of the view resolution. Users can set spark.sql.legacy.useCurrentConfigsForView to true to restore the behavior before.

Why are the changes needed?

This PR is a sub-task of SPARK-33138 that proposes to unify temp view and permanent view behaviors. This PR makes permanent views mimicking the temp view behavior that "fixes" view semantic by directly storing resolved LogicalPlan. For example, if a user uses spark 2.4 to create a view that contains null values from division-by-zero expressions, she may not want that other users' queries which reference her view throw exceptions when running on spark 3.x with ansi mode on.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

added UT + existing UTs (improved)

@github-actions github-actions bot added the SQL label Nov 8, 2020
@SparkQA
Copy link

SparkQA commented Nov 8, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35366/

@SparkQA
Copy link

SparkQA commented Nov 8, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35366/

@SparkQA
Copy link

SparkQA commented Nov 8, 2020

Test build #130757 has finished for PR 30289 at commit b58397c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging
  • abstract class AbstractSqlParser extends ParserInterface with Logging
  • class CatalystSqlParser(deprecatedConf: SQLConf) extends AbstractSqlParser
  • class SparkSqlParser(deprecatedConf: SQLConf) extends AbstractSqlParser
  • class SparkSqlAstBuilder extends AstBuilder
  • class VariableSubstitution(deprecatedConf: SQLConf)

class Analyzer(
override val catalogManager: CatalogManager,
conf: SQLConf)
deprecatedConf: SQLConf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to what this PR proposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is related to another sub-task of SPARK-33138 which makes internal classes of SparkSession always using active SQLConf. I opened a seperated PR (#30299) and removed this change here.

assert(message.startsWith(s"Max iterations ($maxIterations) reached for batch Resolution, " +
s"please set '${SQLConf.ANALYZER_MAX_ITERATIONS.key}' to a larger value."))
withSQLConf(SQLConf.ANALYZER_MAX_ITERATIONS.key -> maxIterations.toString) {
val conf = new SQLConf().copy(SQLConf.ANALYZER_MAX_ITERATIONS -> maxIterations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}
}

test("SPARK-33141 view should be parsed and analyzed with configs set when creating") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SPARK-33141 -> SPARK-33141:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@maropu
Copy link
Member

maropu commented Nov 9, 2020

Could you describe more about the Why are the changes needed? section? In my first impression, since the view is a logical one, the current behaivour (the behaivour of logical views depends on current runtime configs) looks more natural than the proposed one. cc: @cloud-fan @viirya

@viirya
Copy link
Member

viirya commented Nov 9, 2020

BTW, capturing configs into views may hit config migration issue. We might add/change/remove configs across Spark versions.

@viirya
Copy link
Member

viirya commented Nov 9, 2020

Could you describe more about the Why are the changes needed? section? In my first impression, since the view is a logical one, the current behaivour (the behaivour of logical views depends on current runtime configs) looks more natural than the proposed one. cc: @cloud-fan @viirya

What configs we should capture? As @maropu said, a view is basically logical query, is it important to capture SQL configs for it? From the description, seems it is for keeping semantically consistency. For a logical query, will we change its semantic by using different config values? If there is, it is behavior breaking config actually. For such configs, capturing a view's configs seems not making sense too.

Can you describe the use case?

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35374/

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35374/

@cloud-fan
Copy link
Contributor

can we copy-paste more context from JIRA tickets to the PR description? @luluorta

@SparkQA
Copy link

SparkQA commented Nov 9, 2020

Test build #130765 has finished for PR 30289 at commit e3f36cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
private def generateQuerySQLConfigs(conf: SQLConf): Map[String, String] = {
val modifiedConfs = conf.getAllConfs.filter { case (k, _) =>
conf.isModifiable(k) && k != SQLConf.MAX_NESTED_VIEW_DEPTH.key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just save all configs (or all modifiable configs) and postpone the "ignore specific configs" until parse/analyze SQL text? It's easy to maintain.

Copy link
Contributor Author

@luluorta luluorta Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we should keep the captured SQL configs as less as possible. It's hard to precisely captures the configs which only affect the parser and analyzer, an alternative way is just filtering out the configs that definitely can NOT affect parsing/analyzing.

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36137/

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36137/

@SparkQA
Copy link

SparkQA commented Nov 23, 2020

Test build #131534 has finished for PR 30289 at commit 6a41103.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

val VIEW_QUERY_OUTPUT_NUM_COLUMNS = VIEW_QUERY_OUTPUT_PREFIX + "numCols"
val VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX = VIEW_QUERY_OUTPUT_PREFIX + "col."

val VIEW_QUERY_SQL_CONFIGS = VIEW_PREFIX + "query.sqlConfigs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to current catalog/namespace, which is about the context, not query. Can we define it close to VIEW_CATALOG_AND_NAMESPACE and follow it's property key naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.createWithDefault(100)

val APPLY_VIEW_SQL_CONFIGS =
buildConf("spark.sql.legacy.view.applySQLConfigs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy config name should describe the legacy behavior. How about spark.sql.legacy.useCurrentConfigsForView?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


object ViewHelper {

private val configPrefixBlacklist = Seq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: configPrefixDenyList

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"spark.sql.shuffle.",
"spark.sql.adaptive.")

private def isConfigBlacklisted(key: String): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

def shouldCaptureConfig(key: String): Boolean = {
  !configPrefixDenyList.exists(prefix => key.startsWith(prefix))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// for createViewCommand queryOutput may be different from fieldNames
val queryOutput = analyzedPlan.schema.fieldNames

val conf = SQLConf.get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have session passed in, seems better to use session.sessionState.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Convert the view query SQL configs in `properties`.
*/
private def generateQuerySQLConfigs(conf: SQLConf): Map[String, String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlConfigsToProps, following catalogAndNamespaceToProps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

test("SPARK-33141: view should be parsed and analyzed with configs set when creating") {
withTable("t33141") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's hard to read, let's just use t, v1, v2, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

val props = new mutable.HashMap[String, String]
if (modifiedConfs.nonEmpty) {
val confJson = compact(render(JsonProtocol.mapToJson(modifiedConfs)))
props.put(VIEW_QUERY_SQL_CONFIGS, confJson)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you stress-tested this? The hive metastore has a limitation about property value length. You can take a look at HiveExternalCatalog.tableMetaToTableProps.

Another idea is to put one config per table property entry.

Copy link
Contributor Author

@luluorta luluorta Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Splitting a large value string into small chunks seems a hive specific solution, so I changed to store one config per table property entry, each with a "view.sqlConfig." prefix.

@SparkQA
Copy link

SparkQA commented Nov 24, 2020

Test build #131603 has finished for PR 30289 at commit 6a41103.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@luluorta
Copy link
Contributor Author

can we copy-paste more context from JIRA tickets to the PR description?

I added more background information to the PR description.

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131746 has finished for PR 30289 at commit 22b7dde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

My two cents:

  1. It's unfortunate that Spark has "semantic" configs which can change query result. This is bad for a SQL system, but we can't get rid of it now.
  2. For a view, it's more reasonable to "fix" its semantic after created since the semantic can change by configs. If a user turns on ANSI mode, he should still be able to read existing views that may not be valid under ANSI mode.

@SparkQA
Copy link

SparkQA commented Nov 25, 2020

Test build #131749 has finished for PR 30289 at commit f44e71b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 35ded12 Nov 27, 2020
cloud-fan added a commit that referenced this pull request Dec 4, 2020
…ysisContext

### What changes were proposed in this pull request?

This is a followup of #30289. It removes the hack in `View.effectiveSQLConf`, by putting the max nested view depth in `AnalysisContext`. Then we don't get the max nested view depth from the active SQLConf, which keeps changing during nested view resolution.

### Why are the changes needed?

remove hacks.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

If I just remove the hack, `SimpleSQLViewSuite.restrict the nested level of a view` fails. With this fix, it passes again.

Closes #30575 from cloud-fan/view.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan added a commit that referenced this pull request Dec 4, 2020
…ysisContext

### What changes were proposed in this pull request?

This is a followup of #30289. It removes the hack in `View.effectiveSQLConf`, by putting the max nested view depth in `AnalysisContext`. Then we don't get the max nested view depth from the active SQLConf, which keeps changing during nested view resolution.

### Why are the changes needed?

remove hacks.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

If I just remove the hack, `SimpleSQLViewSuite.restrict the nested level of a view` fails. With this fix, it passes again.

Closes #30575 from cloud-fan/view.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit acc211d)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants