Commit f3baf08
[SPARK-43393][SQL][3.5] Address sequence expression overflow bug
### What changes were proposed in this pull request?
Spark has a (long-standing) overflow bug in the `sequence` expression.
Consider the following operations:
```
spark.sql("CREATE TABLE foo (l LONG);")
spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")
spark.sql("SELECT sequence(0, l) FROM foo;").collect()
```
The result of these operations will be:
```
Array[org.apache.spark.sql.Row] = Array([WrappedArray()])
```
an unintended consequence of overflow.
The sequence is applied to values `0` and `Long.MaxValue` with a step size of `1` which uses a length computation defined [here](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451). In this calculation, with `start = 0`, `stop = Long.MaxValue`, and `step = 1`, the calculated `len` overflows to `Long.MinValue`. The computation, in binary looks like:
```
0111111111111111111111111111111111111111111111111111111111111111
- 0000000000000000000000000000000000000000000000000000000000000000
------------------------------------------------------------------
0111111111111111111111111111111111111111111111111111111111111111
/ 0000000000000000000000000000000000000000000000000000000000000001
------------------------------------------------------------------
0111111111111111111111111111111111111111111111111111111111111111
+ 0000000000000000000000000000000000000000000000000000000000000001
------------------------------------------------------------------
1000000000000000000000000000000000000000000000000000000000000000
```
The following [check](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454) passes as the negative `Long.MinValue` is still `<= MAX_ROUNDED_ARRAY_LENGTH`. The following cast to `toInt` uses this representation and [truncates the upper bits](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457) resulting in an empty length of `0`.
Other overflows are similarly problematic.
This PR addresses the issue by checking numeric operations in the length computation for overflow.
### Why are the changes needed?
There is a correctness bug from overflow in the `sequence` expression.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Tests added in `CollectionExpressionsSuite.scala`.
Closes #43820 from thepinetree/spark-sequence-overflow-3.5.
Authored-by: Deepayan Patra <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>1 parent 9e492b7 commit f3baf08
File tree
2 files changed
+71
-20
lines changed- sql/catalyst/src
- main/scala/org/apache/spark/sql/catalyst/expressions
- test/scala/org/apache/spark/sql/catalyst/expressions
2 files changed
+71
-20
lines changedLines changed: 32 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
43 | | - | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
3080 | 3080 | | |
3081 | 3081 | | |
3082 | 3082 | | |
| 3083 | + | |
| 3084 | + | |
| 3085 | + | |
| 3086 | + | |
| 3087 | + | |
| 3088 | + | |
| 3089 | + | |
| 3090 | + | |
| 3091 | + | |
| 3092 | + | |
| 3093 | + | |
| 3094 | + | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
3083 | 3111 | | |
3084 | 3112 | | |
3085 | 3113 | | |
| |||
3451 | 3479 | | |
3452 | 3480 | | |
3453 | 3481 | | |
3454 | | - | |
3455 | | - | |
3456 | | - | |
3457 | | - | |
3458 | | - | |
3459 | | - | |
3460 | | - | |
| 3482 | + | |
3461 | 3483 | | |
3462 | 3484 | | |
3463 | 3485 | | |
| |||
3467 | 3489 | | |
3468 | 3490 | | |
3469 | 3491 | | |
3470 | | - | |
| 3492 | + | |
3471 | 3493 | | |
3472 | 3494 | | |
3473 | 3495 | | |
3474 | 3496 | | |
3475 | 3497 | | |
3476 | 3498 | | |
3477 | 3499 | | |
3478 | | - | |
3479 | | - | |
3480 | | - | |
3481 | | - | |
3482 | | - | |
3483 | | - | |
| 3500 | + | |
3484 | 3501 | | |
3485 | 3502 | | |
3486 | 3503 | | |
| |||
Lines changed: 39 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
769 | 769 | | |
770 | 770 | | |
771 | 771 | | |
772 | | - | |
773 | | - | |
774 | | - | |
775 | | - | |
776 | 772 | | |
777 | 773 | | |
778 | 774 | | |
| |||
782 | 778 | | |
783 | 779 | | |
784 | 780 | | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
785 | 819 | | |
786 | 820 | | |
787 | 821 | | |
| |||
0 commit comments