Skip to content

Conversation

@nikhilsheoran-db
Copy link
Contributor

@nikhilsheoran-db nikhilsheoran-db commented Oct 3, 2024

What changes were proposed in this pull request?

  • Fixes a bug in NormalizeFloatingNumbers to respect the nullable attribute of nested expressions when normalizing.

Why are the changes needed?

  • Without the fix, there would be a degradation in the nullability of the expression post normalization.
  • For example, for an expression like: namedStruct("struct", namedStruct("double", <DoubleType-field>)) with the following data type:
StructType(StructField("struct", StructType(StructField("double", DoubleType, true, {})), false, {}))

after normalizing we would have ended up with the dataType:

StructType(StructField("struct", StructType(StructField("double", DoubleType, true, {})), true, {})) 

Note, the change in the nullable attribute of the "double" StructField from false to true.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Added unit test.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 3, 2024
@HyukjinKwon HyukjinKwon changed the title [SPARK-49863] Fix NormalizeFloatingNumbers to preserve nullability of nested structs [SPARK-49863][SQL] Fix NormalizeFloatingNumbers to preserve nullability of nested structs Oct 4, 2024
@nikhilsheoran-db
Copy link
Contributor Author

cc: @cloud-fan to take a look.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 5e27eec Oct 9, 2024
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…ty of nested structs

### What changes were proposed in this pull request?

- Fixes a bug in `NormalizeFloatingNumbers` to respect the `nullable` attribute of nested expressions when normalizing.

### Why are the changes needed?

- Without the fix, there would be a degradation in the nullability of the expression post normalization.
- For example, for an expression like: `namedStruct("struct", namedStruct("double", <DoubleType-field>)) ` with the following data type:

```
StructType(StructField("struct", StructType(StructField("double", DoubleType, true, {})), false, {}))
```

after normalizing we would have ended up with the dataType:
```
StructType(StructField("struct", StructType(StructField("double", DoubleType, true, {})), true, {}))
```

Note, the change in the `nullable` attribute of the "double" StructField from `false` to `true`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Added unit test.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48331 from nikhilsheoran-db/SPARK-49863-fix.

Authored-by: Nikhil Sheoran <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants