-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-35028][SQL] ANSI mode: disallow group by aliases #32129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving ANSI_ENABLED to the front so that other configurations can refer to it without compiling errors.
|
There has been some discussion under the PR that supports group by alias: #17191 Group by aliases is convenient. But it can be ambiguous and incompatible with SQL standard. |
|
Test build #137212 has finished for PR 32129 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
retest this please |
|
Test build #137220 has finished for PR 32129 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
8c84858 to
62cee4f
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #137226 has finished for PR 32129 at commit
|
| The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). | ||
| - `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. | ||
| - `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map. | ||
| - `GROUP BY`: aliases in a select list can not be used in GROUP BY clauses. Each column referenced in a GROUP BY clause shall unambiguously reference a column of the table resulting from the FROM clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in a GROUP BY clause -> by a GROUP BY clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both should work. The second sentence is from the ANSI SQL standard.
|
Nice! |
|
Merging to master |
### What changes were proposed in this pull request? Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases ](#32129) ### Why are the changes needed? It turns out that many users are using the group by alias feature. Spark has its precedence rule when alias names conflict with column names in Group by clause: always use the table column. This should be reasonable and acceptable. Also, external DBMS such as PostgreSQL and MySQL allow grouping by alias, too. As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing the group by alias in ANSI mode. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Unit tests Closes #33758 from gengliangwang/revertGroupByAlias. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request? Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases ](#32129) ### Why are the changes needed? It turns out that many users are using the group by alias feature. Spark has its precedence rule when alias names conflict with column names in Group by clause: always use the table column. This should be reasonable and acceptable. Also, external DBMS such as PostgreSQL and MySQL allow grouping by alias, too. As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing the group by alias in ANSI mode. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Unit tests Closes #33758 from gengliangwang/revertGroupByAlias. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> (cherry picked from commit 8bfb4f1) Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request? Disallow group by aliases under ANSI mode. ### Why are the changes needed? As per the ANSI SQL standard secion 7.12 <group by clause>: >Each `grouping column reference` shall unambiguously reference a column of the table resulting from the `from clause`. A column referenced in a `group by clause` is a grouping column. By forbidding it, we can avoid ambiguous SQL queries like: ``` SELECT col + 1 as col FROM t GROUP BY col ``` ### Does this PR introduce _any_ user-facing change? Yes, group by aliases is not allowed under ANSI mode. ### How was this patch tested? Unit tests Closes apache#32129 from gengliangwang/disallowGroupByAlias. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request? Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases ](apache#32129) ### Why are the changes needed? It turns out that many users are using the group by alias feature. Spark has its precedence rule when alias names conflict with column names in Group by clause: always use the table column. This should be reasonable and acceptable. Also, external DBMS such as PostgreSQL and MySQL allow grouping by alias, too. As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing the group by alias in ANSI mode. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Unit tests Closes apache#33758 from gengliangwang/revertGroupByAlias. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> (cherry picked from commit 8bfb4f1) Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
Disallow group by aliases under ANSI mode.
Why are the changes needed?
As per the ANSI SQL standard secion 7.12 :
By forbidding it, we can avoid ambiguous SQL queries like:
Does this PR introduce any user-facing change?
Yes, group by aliases is not allowed under ANSI mode.
How was this patch tested?
Unit tests