Skip to content

Commit 79e55b4

Browse files
committed
[SPARK-35028][SQL] ANSI mode: disallow group by aliases
### What changes were proposed in this pull request? Disallow group by aliases under ANSI mode. ### Why are the changes needed? As per the ANSI SQL standard secion 7.12 <group by clause>: >Each `grouping column reference` shall unambiguously reference a column of the table resulting from the `from clause`. A column referenced in a `group by clause` is a grouping column. By forbidding it, we can avoid ambiguous SQL queries like: ``` SELECT col + 1 as col FROM t GROUP BY col ``` ### Does this PR introduce _any_ user-facing change? Yes, group by aliases is not allowed under ANSI mode. ### How was this patch tested? Unit tests Closes #32129 from gengliangwang/disallowGroupByAlias. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
1 parent 278203d commit 79e55b4

File tree

5 files changed

+1089
-14
lines changed

5 files changed

+1089
-14
lines changed

docs/sql-ref-ansi-compliance.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql.
183183
The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`).
184184
- `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices.
185185
- `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map.
186+
- `GROUP BY`: aliases in a select list can not be used in GROUP BY clauses. Each column referenced in a GROUP BY clause shall unambiguously reference a column of the table resulting from the FROM clause.
186187

187188
### SQL Keywords
188189

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1847,9 +1847,12 @@ class Analyzer(override val catalogManager: CatalogManager)
18471847
}}
18481848
}
18491849

1850+
// Group by alias is not allowed in ANSI mode.
1851+
private def allowGroupByAlias: Boolean = conf.groupByAliases && !conf.ansiEnabled
1852+
18501853
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
18511854
case agg @ Aggregate(groups, aggs, child)
1852-
if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) &&
1855+
if allowGroupByAlias && child.resolved && aggs.forall(_.resolved) &&
18531856
groups.exists(!_.resolved) =>
18541857
agg.copy(groupingExpressions = mayResolveAttrByAggregateExprs(groups, aggs, child))
18551858
}

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,17 @@ object SQLConf {
206206
.intConf
207207
.createWithDefault(100)
208208

209+
val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
210+
.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " +
211+
"For example, Spark will throw an exception at runtime instead of returning null results " +
212+
"when the inputs to a SQL operator/function are invalid." +
213+
"For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " +
214+
"Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " +
215+
"standard directly, but their behaviors align with ANSI SQL's style")
216+
.version("3.0.0")
217+
.booleanConf
218+
.createWithDefault(false)
219+
209220
val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules")
210221
.doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " +
211222
"specified by their rule names and separated by comma. It is not guaranteed that all the " +
@@ -1092,8 +1103,9 @@ object SQLConf {
10921103
.createWithDefault(true)
10931104

10941105
val GROUP_BY_ALIASES = buildConf("spark.sql.groupByAliases")
1095-
.doc("When true, aliases in a select list can be used in group by clauses. When false, " +
1096-
"an analysis exception is thrown in the case.")
1106+
.doc("This configuration is only effective when ANSI mode is disabled. When it is true and " +
1107+
s"${ANSI_ENABLED.key} is false, aliases in a select list can be used in group by clauses. " +
1108+
"Otherwise, an analysis exception is thrown in the case.")
10971109
.version("2.2.0")
10981110
.booleanConf
10991111
.createWithDefault(true)
@@ -2348,17 +2360,6 @@ object SQLConf {
23482360
.checkValues(StoreAssignmentPolicy.values.map(_.toString))
23492361
.createWithDefault(StoreAssignmentPolicy.ANSI.toString)
23502362

2351-
val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled")
2352-
.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " +
2353-
"For example, Spark will throw an exception at runtime instead of returning null results " +
2354-
"when the inputs to a SQL operator/function are invalid." +
2355-
"For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " +
2356-
"Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " +
2357-
"standard directly, but their behaviors align with ANSI SQL's style")
2358-
.version("3.0.0")
2359-
.booleanConf
2360-
.createWithDefault(false)
2361-
23622363
val SORT_BEFORE_REPARTITION =
23632364
buildConf("spark.sql.execution.sortBeforeRepartition")
23642365
.internal()
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
--IMPORT group-analytics.sql

0 commit comments

Comments
 (0)