-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31171][SQL] size(null) should return null under ansi mode #27936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def legacySizeOfNull: Boolean = getConf(SQLConf.LEGACY_SIZE_OF_NULL) | ||
| def legacySizeOfNull: Boolean = { | ||
| // size(null) should return null under ansi mode. | ||
| getConf(SQLConf.LEGACY_SIZE_OF_NULL) && !getConf(ANSI_ENABLED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, ansi conf overrides legacy conf ? Do we have another legacy conf like this? Is it consistent with the other legacy? Could you update the migration guide and conf document to be consistent, too?
cc @gatorsmile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first one, and I don't think we need a migration guide as the behavior doesn't change by default.
I'll update the config doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
Test build #119932 has finished for PR 27936 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you so much, @cloud-fan .
Merged to master/3.0.
This already passed the Jenkins test and the last commit is only Config doc change and I verified like the following.
scala> org.apache.spark.sql.internal.SQLConf.LEGACY_SIZE_OF_NULL.doc
res16: String = If it is set to false, or spark.sql.ansi.enabled is true, then size of null returns null. Otherwise, it returns -1, which was inherited from Hive.Make `size(null)` return null under ANSI mode, regardless of the `spark.sql.legacy.sizeOfNull` config. In #27834, we change the result of `size(null)` to be -1 to match the 2.4 behavior and avoid breaking changes. However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of `size(null)` is a good fit of the ANSI mode. No as ANSI mode is off by default. new tests Closes #27936 from cloud-fan/null. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit dc5ebc2) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Late LGTM. Btw, would it be better to update function javadoc as well? |
|
Test build #119941 has finished for PR 27936 at commit
|
|
+1 for @HeartSaVioR 's comment. |
|
Maybe worth to mention here as well - https://github.com/apache/spark/blob/master/docs/sql-ref-ansi-compliance.md |
|
Yea, as @HeartSaVioR suggested above, I also think we need to update the doc. |
|
Will open a followup soon, thanks for taking a look! |
### What changes were proposed in this pull request? A followup of #27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
### What changes were proposed in this pull request? A followup of #27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27950 from cloud-fan/null. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 8643e5d) Signed-off-by: Takeshi Yamamuro <[email protected]>
### What changes were proposed in this pull request? Make `size(null)` return null under ANSI mode, regardless of the `spark.sql.legacy.sizeOfNull` config. ### Why are the changes needed? In apache#27834, we change the result of `size(null)` to be -1 to match the 2.4 behavior and avoid breaking changes. However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of `size(null)` is a good fit of the ANSI mode. ### Does this PR introduce any user-facing change? No as ANSI mode is off by default. ### How was this patch tested? new tests Closes apache#27936 from cloud-fan/null. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? A followup of apache#27936 to update document. ### Why are the changes needed? correct document ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes apache#27950 from cloud-fan/null. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
What changes were proposed in this pull request?
Make
size(null)return null under ANSI mode, regardless of thespark.sql.legacy.sizeOfNullconfig.Why are the changes needed?
In #27834, we change the result of
size(null)to be -1 to match the 2.4 behavior and avoid breaking changes.However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of
size(null)is a good fit of the ANSI mode.Does this PR introduce any user-facing change?
No as ANSI mode is off by default.
How was this patch tested?
new tests