Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Make size(null) return null under ANSI mode, regardless of the spark.sql.legacy.sizeOfNull config.

Why are the changes needed?

In #27834, we change the result of size(null) to be -1 to match the 2.4 behavior and avoid breaking changes.

However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of size(null) is a good fit of the ANSI mode.

Does this PR introduce any user-facing change?

No as ANSI mode is off by default.

How was this patch tested?

new tests

@cloud-fan
Copy link
Contributor Author

def legacySizeOfNull: Boolean = getConf(SQLConf.LEGACY_SIZE_OF_NULL)
def legacySizeOfNull: Boolean = {
// size(null) should return null under ansi mode.
getConf(SQLConf.LEGACY_SIZE_OF_NULL) && !getConf(ANSI_ENABLED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, ansi conf overrides legacy conf ? Do we have another legacy conf like this? Is it consistent with the other legacy? Could you update the migration guide and conf document to be consistent, too?

cc @gatorsmile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first one, and I don't think we need a migration guide as the behavior doesn't change by default.

I'll update the config doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119932 has finished for PR 27936 at commit a1b68cd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you so much, @cloud-fan .
Merged to master/3.0.

This already passed the Jenkins test and the last commit is only Config doc change and I verified like the following.

scala> org.apache.spark.sql.internal.SQLConf.LEGACY_SIZE_OF_NULL.doc
res16: String = If it is set to false, or spark.sql.ansi.enabled is true, then size of null returns null. Otherwise, it returns -1, which was inherited from Hive.

dongjoon-hyun pushed a commit that referenced this pull request Mar 17, 2020
Make `size(null)` return null under ANSI mode, regardless of the `spark.sql.legacy.sizeOfNull` config.

In #27834, we change the result of `size(null)` to be -1 to match the 2.4 behavior and avoid breaking changes.

However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of `size(null)` is a good fit of the ANSI mode.

No as ANSI mode is off by default.

new tests

Closes #27936 from cloud-fan/null.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit dc5ebc2)
Signed-off-by: Dongjoon Hyun <[email protected]>
@HeartSaVioR
Copy link
Contributor

Late LGTM. Btw, would it be better to update function javadoc as well?

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119941 has finished for PR 27936 at commit 8459d63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

+1 for @HeartSaVioR 's comment.

@HeartSaVioR
Copy link
Contributor

Maybe worth to mention here as well - https://github.com/apache/spark/blob/master/docs/sql-ref-ansi-compliance.md

@maropu
Copy link
Member

maropu commented Mar 18, 2020

Yea, as @HeartSaVioR suggested above, I also think we need to update the doc.

@cloud-fan
Copy link
Contributor Author

Will open a followup soon, thanks for taking a look!

maropu pushed a commit that referenced this pull request Mar 18, 2020
### What changes were proposed in this pull request?

A followup of #27936 to update document.

### Why are the changes needed?

correct document

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27950 from cloud-fan/null.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
maropu pushed a commit that referenced this pull request Mar 18, 2020
### What changes were proposed in this pull request?

A followup of #27936 to update document.

### Why are the changes needed?

correct document

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes #27950 from cloud-fan/null.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
(cherry picked from commit 8643e5d)
Signed-off-by: Takeshi Yamamuro <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

Make `size(null)` return null under ANSI mode, regardless of the `spark.sql.legacy.sizeOfNull` config.

### Why are the changes needed?

In apache#27834, we change the result of `size(null)` to be -1 to match the 2.4 behavior and avoid breaking changes.

However, it's true that the "return -1" behavior is error-prone when being used with aggregate functions. The current ANSI mode controls a bunch of "better behaviors" like failing on overflow. We don't enable these "better behaviors" by default because they are too breaking. The "return null" behavior of `size(null)` is a good fit of the ANSI mode.

### Does this PR introduce any user-facing change?

No as ANSI mode is off by default.

### How was this patch tested?

new tests

Closes apache#27936 from cloud-fan/null.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

A followup of apache#27936 to update document.

### Why are the changes needed?

correct document

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes apache#27950 from cloud-fan/null.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants