Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR adds a new expression AssertNotNull to ensure non-nullable fields of products and case classes don't receive null values at runtime.

@liancheng liancheng force-pushed the dataset-nullability-check branch from 77a0649 to 7c1f57d Compare December 16, 2015 16:59
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to show nullability in the query plan / expression tree. I found it useful while debugging nullability issues.

@liancheng liancheng changed the title [SPARK-12323][SQL] Checks Dataset nullability during resolution [SPARK-12371][SQL] Checks Dataset nullability during resolution Dec 16, 2015
@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47821 has finished for PR 10331 at commit 7c1f57d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Hm, similar as #10296, this PR also caught several existing nullability inconsistency bugs, which caused the last build failure. Trying to fix them.

@yhuai
Copy link
Contributor

yhuai commented Dec 17, 2015

@liancheng Regarding the scope of this jira, my understanding is that when we create JVM objects, if there is any null values and we are trying to set them to primitive fields of the class, we should throw a Runtime exception to ask them to use Option or non-primitive JVM type (e.g. java.lang.Integer instead of Int). This is a runtime check. For example, we want to encode a row to a case class case class TestData(a: Int). If the first value of a row is null, we should throw the Runtime exception when we try to create a TestData instance because a is a primitive type field.

@liancheng liancheng force-pushed the dataset-nullability-check branch from 7c1f57d to 1c32728 Compare December 20, 2015 18:15
@liancheng
Copy link
Contributor Author

@yhuai Thanks a lot for the explanation, I misunderstood the scope of the JIRA ticket. Updated this PR according to @marmbrus's [comment][1] in #10296. A new expression AssertNotNull is added to assert non-nullable constructor arguments are indeed non-null.

@SparkQA
Copy link

SparkQA commented Dec 20, 2015

Test build #48084 has finished for PR 10331 at commit f32fc73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

@liancheng
Copy link
Contributor Author

retest this please

@liancheng liancheng changed the title [SPARK-12371][SQL] Checks Dataset nullability during resolution [SPARK-12371][SQL] Dataset nullability check Dec 20, 2015
@SparkQA
Copy link

SparkQA commented Dec 20, 2015

Test build #48086 has finished for PR 10331 at commit f32fc73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, thanks!

@yhuai
Copy link
Contributor

yhuai commented Dec 21, 2015

Should we just keep the runtime part of changes?

@liancheng
Copy link
Contributor Author

@yhuai Do you think we should move analysis phase checking into another PR or just drop that part? This check does find other nullability bugs (revealed by the Jenkins build failure). And I think Dataset nullability of Dataset schema should conforms to the underlying logical plan.

@yhuai
Copy link
Contributor

yhuai commented Dec 21, 2015

Yeah, let's use this PR for the runtime check.

@liancheng liancheng changed the title [SPARK-12371][SQL] Dataset nullability check [SPARK-12371][SQL] Runtime nullability check for NewInstance Dec 21, 2015
@liancheng
Copy link
Contributor Author

@yhuai Narrowed down the scope of this PR. As we discussed offline, will open another one for the analysis phase check.

@liancheng liancheng changed the title [SPARK-12371][SQL] Runtime nullability check for NewInstance [SPARK-12371][SQL] Runtime nullability check for NewInstance Dec 21, 2015
@SparkQA
Copy link

SparkQA commented Dec 21, 2015

Test build #48103 has finished for PR 10331 at commit 94be50e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

@SparkQA
Copy link

SparkQA commented Dec 21, 2015

Test build #48114 has finished for PR 10331 at commit 2c59a19.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

@liancheng
Copy link
Contributor Author

cc @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should revert the changes of this file?

@SparkQA
Copy link

SparkQA commented Dec 21, 2015

Test build #48117 has finished for PR 10331 at commit 9f42052.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

@marmbrus
Copy link
Contributor

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is: if the parent is null, we should shortcut the execution to return null directly, instead of going into the field and trigger the null check. However, looks like we only do this shortcut for product type by If(IsNull...), we may also need to handle array type and map type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Verified that the current mechanism doesn't play well with primitive arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually what the test case I constructed reflected is another separate bug, which has been fixed in PR #10401.

Discussed with @cloud-fan offline. What he meant was, we should also add AssertNotNull for array types and map types, with which I totally agree, but I think it would be nice to be added in a separate PR.

@liancheng liancheng force-pushed the dataset-nullability-check branch from 9f42052 to 759c20d Compare December 22, 2015 03:03
@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48155 has finished for PR 10331 at commit 759c20d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

@liancheng
Copy link
Contributor Author

retest this please

@liancheng
Copy link
Contributor Author

The last build failure was irrelevant.

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48178 has finished for PR 10331 at commit 759c20d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class AssertNotNull(\n

@liancheng
Copy link
Contributor Author

Merging to master.

@asfgit asfgit closed this in 42bfde2 Dec 22, 2015
marmbrus pushed a commit to marmbrus/spark that referenced this pull request Jan 7, 2016
This PR adds a new expression `AssertNotNull` to ensure non-nullable fields of products and case classes don't receive null values at runtime.

Author: Cheng Lian <[email protected]>

Closes apache#10331 from liancheng/dataset-nullability-check.
@liancheng liancheng deleted the dataset-nullability-check branch February 1, 2016 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants