Commit 0093257
[SPARK-14393][SQL] values generated by non-deterministic functions shouldn't change after coalesce or union
## What changes were proposed in this pull request?
When a user appended a column using a "nondeterministic" function to a DataFrame, e.g., `rand`, `randn`, and `monotonically_increasing_id`, the expected semantic is the following:
- The value in each row should remain unchanged, as if we materialize the column immediately, regardless of later DataFrame operations.
However, since we use `TaskContext.getPartitionId` to get the partition index from the current thread, the values from nondeterministic columns might change if we call `union` or `coalesce` after. `TaskContext.getPartitionId` returns the partition index of the current Spark task, which might not be the corresponding partition index of the DataFrame where we defined the column.
See the unit tests below or JIRA for examples.
This PR uses the partition index from `RDD.mapPartitionWithIndex` instead of `TaskContext` and fixes the partition initialization logic in whole-stage codegen, normal codegen, and codegen fallback. `initializeStatesForPartition(partitionIndex: Int)` was added to `Projection`, `Nondeterministic`, and `Predicate` (codegen) and initialized right after object creation in `mapPartitionWithIndex`. `newPredicate` now returns a `Predicate` instance rather than a function for proper initialization.
## How was this patch tested?
Unit tests. (Actually I'm not very confident that this PR fixed all issues without introducing new ones ...)
cc: rxin davies
Author: Xiangrui Meng <[email protected]>
Closes #15567 from mengxr/SPARK-14393.
(cherry picked from commit 02f2031)
Signed-off-by: Reynold Xin <[email protected]>1 parent a885d5b commit 0093257
File tree
32 files changed
+231
-78
lines changed- core/src/main/scala/org/apache/spark/rdd
- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst
- expressions
- codegen
- optimizer
- test/scala/org/apache/spark/sql/catalyst/expressions
- codegen
- core/src
- main/scala/org/apache/spark/sql/execution
- columnar
- joins
- test/scala/org/apache/spark/sql
- hive/src/main/scala/org/apache/spark/sql/hive/execution
32 files changed
+231
-78
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
788 | 788 | | |
789 | 789 | | |
790 | 790 | | |
791 | | - | |
792 | | - | |
| 791 | + | |
| 792 | + | |
793 | 793 | | |
794 | 794 | | |
795 | 795 | | |
796 | 796 | | |
797 | 797 | | |
798 | 798 | | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
799 | 811 | | |
800 | 812 | | |
801 | 813 | | |
| |||
Lines changed: 15 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
272 | 272 | | |
273 | 273 | | |
274 | 274 | | |
| 275 | + | |
275 | 276 | | |
276 | 277 | | |
277 | | - | |
278 | | - | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
279 | 284 | | |
280 | 285 | | |
281 | 286 | | |
282 | | - | |
| 287 | + | |
283 | 288 | | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
284 | 294 | | |
285 | | - | |
| 295 | + | |
| 296 | + | |
286 | 297 | | |
287 | 298 | | |
288 | 299 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
Lines changed: 6 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | | - | |
72 | | - | |
73 | | - | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| |||
Lines changed: 14 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| |||
54 | 57 | | |
55 | 58 | | |
56 | 59 | | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
63 | 67 | | |
64 | 68 | | |
65 | 69 | | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
70 | 76 | | |
71 | 77 | | |
72 | 78 | | |
| |||
Lines changed: 6 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
26 | | - | |
| 25 | + | |
27 | 26 | | |
28 | 27 | | |
29 | | - | |
| 28 | + | |
30 | 29 | | |
31 | 30 | | |
32 | 31 | | |
| |||
38 | 37 | | |
39 | 38 | | |
40 | 39 | | |
41 | | - | |
42 | | - | |
| 40 | + | |
| 41 | + | |
43 | 42 | | |
44 | 43 | | |
45 | 44 | | |
46 | 45 | | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | | - | |
| 48 | + | |
| 49 | + | |
51 | 50 | | |
52 | 51 | | |
53 | 52 | | |
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
184 | 184 | | |
185 | 185 | | |
186 | 186 | | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
187 | 201 | | |
188 | 202 | | |
189 | 203 | | |
| |||
Lines changed: 13 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | 28 | | |
34 | 29 | | |
35 | 30 | | |
36 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
37 | 45 | | |
38 | 46 | | |
39 | 47 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
114 | 118 | | |
115 | 119 | | |
116 | 120 | | |
| |||
Lines changed: 14 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
28 | 35 | | |
29 | 36 | | |
30 | 37 | | |
31 | 38 | | |
32 | 39 | | |
33 | | - | |
| 40 | + | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | 44 | | |
38 | 45 | | |
39 | 46 | | |
40 | | - | |
| 47 | + | |
41 | 48 | | |
42 | 49 | | |
43 | 50 | | |
| |||
55 | 62 | | |
56 | 63 | | |
57 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
58 | 69 | | |
59 | 70 | | |
60 | 71 | | |
| |||
67 | 78 | | |
68 | 79 | | |
69 | 80 | | |
70 | | - | |
71 | | - | |
| 81 | + | |
72 | 82 | | |
73 | 83 | | |
0 commit comments