-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12371][SQL] Runtime nullability check for NewInstance #10331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
77a0649 to
7c1f57d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to show nullability in the query plan / expression tree. I found it useful while debugging nullability issues.
|
Test build #47821 has finished for PR 10331 at commit
|
|
Hm, similar as #10296, this PR also caught several existing nullability inconsistency bugs, which caused the last build failure. Trying to fix them. |
|
@liancheng Regarding the scope of this jira, my understanding is that when we create JVM objects, if there is any null values and we are trying to set them to primitive fields of the class, we should throw a Runtime exception to ask them to use Option or non-primitive JVM type (e.g. java.lang.Integer instead of Int). This is a runtime check. For example, we want to encode a row to a case class |
7c1f57d to
1c32728
Compare
|
Test build #48084 has finished for PR 10331 at commit
|
|
retest this please |
|
Test build #48086 has finished for PR 10331 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, thanks!
|
Should we just keep the runtime part of changes? |
|
@yhuai Do you think we should move analysis phase checking into another PR or just drop that part? This check does find other nullability bugs (revealed by the Jenkins build failure). And I think Dataset nullability of Dataset schema should conforms to the underlying logical plan. |
|
Yeah, let's use this PR for the runtime check. |
NewInstance
|
@yhuai Narrowed down the scope of this PR. As we discussed offline, will open another one for the analysis phase check. |
NewInstance|
Test build #48103 has finished for PR 10331 at commit
|
|
Test build #48114 has finished for PR 10331 at commit
|
|
cc @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should revert the changes of this file?
|
Test build #48117 has finished for PR 10331 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is: if the parent is null, we should shortcut the execution to return null directly, instead of going into the field and trigger the null check. However, looks like we only do this shortcut for product type by If(IsNull...), we may also need to handle array type and map type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Verified that the current mechanism doesn't play well with primitive arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually what the test case I constructed reflected is another separate bug, which has been fixed in PR #10401.
Discussed with @cloud-fan offline. What he meant was, we should also add AssertNotNull for array types and map types, with which I totally agree, but I think it would be nice to be added in a separate PR.
9f42052 to
759c20d
Compare
|
Test build #48155 has finished for PR 10331 at commit
|
|
retest this please |
|
The last build failure was irrelevant. |
|
Test build #48178 has finished for PR 10331 at commit
|
|
Merging to master. |
This PR adds a new expression `AssertNotNull` to ensure non-nullable fields of products and case classes don't receive null values at runtime. Author: Cheng Lian <[email protected]> Closes apache#10331 from liancheng/dataset-nullability-check.
This PR adds a new expression
AssertNotNullto ensure non-nullable fields of products and case classes don't receive null values at runtime.