-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31022][SQL] group by alias should fail if there are name conflicts #27775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #119241 has finished for PR 27775 at commit
|
|
|
||
| - The decimal string representation can be different between Hive 1.2 and Hive 2.3 when using `TRANSFORM` operator in SQL for script transformation, which depends on hive's behavior. In Hive 1.2, the string representation omits trailing zeroes. But in Hive 2.3, it is always padded to 18 digits with trailing zeroes if necessary. | ||
|
|
||
| - Since Spark 3.0, group by alias fails if there are name conflicts like `SELECT col + 1 as col FROM t GROUP BY col`. In Spark version 2.4 and earlier, it works and the column will be resolved using child output. To restore the previous behaviour, set `spark.sql.legacy.allowAmbiguousGroupByAlias` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: like -> such as
|
|
||
| val LEGACY_ALLOW_AMBIGUOUS_GROUP_BY_ALIAS = | ||
| buildConf("spark.sql.legacy.allowAmbiguousGroupByAlias") | ||
| .doc(s"When ${GROUP_BY_ALIASES.key} is enabled and this conf is true, Spark will resolve " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: conf -> configuration
| buildConf("spark.sql.legacy.allowAmbiguousGroupByAlias") | ||
| .doc(s"When ${GROUP_BY_ALIASES.key} is enabled and this conf is true, Spark will resolve " + | ||
| "the GROUP BY column using child's output, even though there is an ambiguous alias in " + | ||
| "the SELECT clause. Id false, Spark fails the query.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Id -> if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @cloud-fan . Sorry, but I'm -1 because this is a regression from 2.x. Confusion is too subjective to deprecate this behind a new legacy config. Is there any other reason you have?
PostgreSQL and MySQL works like Apache Spark 2.4.x.
cc @dbtsai
|
Yea, it seems SQL server, oracle and presto accept this alias, so I worried that this change makes users a bit confused. |
|
It turns out the TPCDS queries also have name conflicts. I think users can only accept it and be careful when writing GROUP BY columns. |
What changes were proposed in this pull request?
Make group by alias fail if there are name conflicts like
SELECT col + 1 as col FROM t GROUP BY col.Why are the changes needed?
It's super confusing that
SELECT col + 1 as new_col FROM t GROUP BY new_colandSELECT col + 1 as col FROM t GROUP BY colworks differently.Does this PR introduce any user-facing change?
yes, group by alias now fails if there are name conflicts.
How was this patch tested?
new tests