Commit c7ef560
[SPARK-40963][SQL] Set nullable correctly in project created by
### What changes were proposed in this pull request?
When creating the project list for the new projection In `ExtractGenerator`, take into account whether the generator is outer when setting nullable on generator-related output attributes.
### Why are the changes needed?
This PR fixes an issue that can produce either incorrect results or a `NullPointerException`. It's a bit of an obscure issue in that I am hard-pressed to reproduce without using a subquery that has a inline table.
Example:
```
select c1, explode(c4) as c5 from (
select c1, array(c3) as c4 from (
select c1, explode_outer(c2) as c3
from values
(1, array(1, 2)),
(2, array(2, 3)),
(3, null)
as data(c1, c2)
)
);
+---+---+
|c1 |c5 |
+---+---+
|1 |1 |
|1 |2 |
|2 |2 |
|2 |3 |
|3 |0 |
+---+---+
```
In the last row, `c5` is 0, but should be `NULL`.
Another example:
```
select c1, exists(c4, x -> x is null) as c5 from (
select c1, array(c3) as c4 from (
select c1, explode_outer(c2) as c3
from values
(1, array(1, 2)),
(2, array(2, 3)),
(3, array())
as data(c1, c2)
)
);
+---+-----+
|c1 |c5 |
+---+-----+
|1 |false|
|1 |false|
|2 |false|
|2 |false|
|3 |false|
+---+-----+
```
In the last row, `false` should be `true`.
In both cases, at the time `CreateArray(c3)` is instantiated, `c3`'s nullability is incorrect because the new projection created by `ExtractGenerator` uses `generatorOutput` from `explode_outer(c2)` as a projection list. `generatorOutput` doesn't take into account that `explode_outer(c2)` is an _outer_ explode, so the nullability setting is lost.
`UpdateAttributeNullability` will eventually fix the nullable setting for attributes referring to `c3`, but it doesn't fix the `containsNull` setting for `c4` in `explode(c4)` (from the first example) or `exists(c4, x -> x is null)` (from the second example).
This example fails with a `NullPointerException`:
```
select c1, inline_outer(c4) from (
select c1, array(c3) as c4 from (
select c1, explode_outer(c2) as c3
from values
(1, array(named_struct('a', 1, 'b', 2))),
(2, array(named_struct('a', 3, 'b', 4), named_struct('a', 5, 'b', 6))),
(3, array())
as data(c1, c2)
)
);
22/10/30 17:34:42 ERROR Executor: Exception in task 1.0 in stage 8.0 (TID 14)
java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_1$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit test.
Closes apache#38440 from bersprockets/SPARK-40963.
Authored-by: Bruce Robbins <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 90d3154)
Signed-off-by: Hyukjin Kwon <[email protected]>ExtractGenerator
1 parent 0f234d9 commit c7ef560
File tree
3 files changed
+29
-9
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst
- analysis
- plans/logical
- core/src/test/scala/org/apache/spark/sql
3 files changed
+29
-9
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2826 | 2826 | | |
2827 | 2827 | | |
2828 | 2828 | | |
2829 | | - | |
| 2829 | + | |
2830 | 2830 | | |
2831 | 2831 | | |
2832 | 2832 | | |
| |||
Lines changed: 9 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
146 | 152 | | |
147 | | - | |
| 153 | + | |
148 | 154 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
| 155 | + | |
| 156 | + | |
156 | 157 | | |
157 | 158 | | |
158 | 159 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
428 | 447 | | |
429 | 448 | | |
430 | 449 | | |
| |||
0 commit comments