Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Apr 13, 2016

What changes were proposed in this pull request?

The problem is: In RowEncoder, we use Invoke to get the field of an external row, which lose the nullability information. This PR creates a GetExternalRowField expression, so that we can preserve the nullability info.

TODO: simplify the null handling logic in RowEncoder, to remove so many if branches, in follow-up PR.

How was this patch tested?

new tests in RowEncoderSuite

Note that, This PR takes over #11980, with a little simplification, so all credits should go to @koertkuipers

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55729 has finished for PR 12364 at commit f8c0bfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55730 has finished for PR 12364 at commit 2976d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val encoder = RowEncoder(schema)
assert(encoder.serializer.length == 1)
assert(encoder.serializer.head.dataType == IntegerType)
assert(encoder.serializer.head.nullable == false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we throw an exception if there is a null in the data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, actually we will, we should add the runtime null check like we did for product encoder

@cloud-fan cloud-fan closed this Apr 14, 2016
@cloud-fan cloud-fan reopened this Apr 14, 2016
@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55831 has finished for PR 12364 at commit ad8c9ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55828 has finished for PR 12364 at commit 2976d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 16, 2016

Is the TODO in the pr description still valid?

@cloud-fan
Copy link
Contributor Author

yea, it's still valid, but I'd like to do it in follow-ups, as this PR is taken over from other people, I don't want to enlarge the scope too much.

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

OK please label it as a todo for future pr; otherwise it is difficult to tell if it is meant to be done as the current one.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56083 has finished for PR 12364 at commit 4ee75e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56165 has finished for PR 12364 at commit 645f0a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val inputObject = BoundReference(0, ObjectType(cls), nullable = true)
// We use an If expression to wrap extractorsFor result of StructType
val serializer = serializerFor(inputObject, schema).asInstanceOf[If].falseValue
val inputObject = BoundReference(0, ObjectType(cls), nullable = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input object should never be null, we also use this assumption in ExpressionEncoder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ExpressionEncoder allows null input now. But I agree that for RowEncoder this assumption is reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add a comment here. It's not super intuitive.

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57607 has finished for PR 12364 at commit b1e12b5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57633 has finished for PR 12364 at commit 7c4c91c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57716 has finished for PR 12364 at commit 8870650.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

cc @liancheng

// We use an If expression to wrap extractorsFor result of StructType
val serializer = serializerFor(inputObject, schema).asInstanceOf[If].falseValue
val inputObject = BoundReference(0, ObjectType(cls), nullable = false)
val serializer = serializerFor(inputObject, schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is also because we don't allow null input Rows now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, there is no if anymore for the top row object.

@liancheng
Copy link
Contributor

Mostly LGTM except a few minor comments.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57879 has finished for PR 12364 at commit 98e1463.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57894 has finished for PR 12364 at commit 18aa126.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

LGTM, merging to master and branch-2.0.

asfgit pushed a commit that referenced this pull request May 5, 2016
## What changes were proposed in this pull request?

The problem is: In `RowEncoder`, we use `Invoke` to get the field of an external row, which lose the nullability information. This PR creates a `GetExternalRowField` expression, so that we can preserve the nullability info.

TODO: simplify the null handling logic in `RowEncoder`, to remove so many if branches, in follow-up PR.

## How was this patch tested?

new tests in `RowEncoderSuite`

Note that, This PR takes over #11980, with a little simplification, so all credits should go to koertkuipers

Author: Wenchen Fan <[email protected]>
Author: Koert Kuipers <[email protected]>

Closes #12364 from cloud-fan/nullable.

(cherry picked from commit 55cc1c9)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in 55cc1c9 May 5, 2016
asfgit pushed a commit that referenced this pull request May 11, 2016
…Encoder

## What changes were proposed in this pull request?

SPARK-15241: We now support java decimal and catalyst decimal in external row, it makes sense to also support scala decimal.

SPARK-15242: This is a long-standing bug, and is exposed after #12364, which eliminate the `If` expression if the field is not nullable:
```
val fieldValue = serializerFor(
  GetExternalRowField(inputObject, i, externalDataTypeForInput(f.dataType)),
  f.dataType)
if (f.nullable) {
  If(
    Invoke(inputObject, "isNullAt", BooleanType, Literal(i) :: Nil),
    Literal.create(null, f.dataType),
    fieldValue)
} else {
  fieldValue
}
```

Previously, we always use `DecimalType.SYSTEM_DEFAULT` as the output type of converted decimal field, which is wrong as it doesn't match the real decimal type. However, it works well because we always put converted field into `If` expression to do the null check, and `If` use its `trueValue`'s data type as its output type.
Now if we have a not nullable decimal field, then the converted field's output type will be `DecimalType.SYSTEM_DEFAULT`, and we will write wrong data into unsafe row.

The fix is simple, just use the given decimal type as the output type of converted decimal field.

These 2 issues was found at #13008

## How was this patch tested?

new tests in RowEncoderSuite

Author: Wenchen Fan <[email protected]>

Closes #13019 from cloud-fan/encoder-decimal.

(cherry picked from commit d8935db)
Signed-off-by: Davies Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants