Commit a082f46
[SPARK-33071][SPARK-33536][SQL] Avoid changing dataset_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin
### What changes were proposed in this pull request?
Currently, `join()` uses `withPlan(logicalPlan)` for convenient to call some Dataset functions. But it leads to the `dataset_id` inconsistent between the `logicalPlan` and the original `Dataset`(because `withPlan(logicalPlan)` will create a new Dataset with the new id and reset the `dataset_id` with the new id of the `logicalPlan`). As a result, it breaks the rule `DetectAmbiguousSelfJoin`.
In this PR, we propose to drop the usage of `withPlan` but use the `logicalPlan` directly so its `dataset_id` doesn't change.
Besides, this PR also removes related metadata (`DATASET_ID_KEY`, `COL_POS_KEY`) when an `Alias` tries to construct its own metadata. Because the `Alias` is no longer a reference column after converting to an `Attribute`. To achieve that, we add a new field, `deniedMetadataKeys`, to indicate the metadata that needs to be removed.
### Why are the changes needed?
For the query below, it returns the wrong result while it should throws ambiguous self join exception instead:
```scala
val emp1 = Seq[TestData](
TestData(1, "sales"),
TestData(2, "personnel"),
TestData(3, "develop"),
TestData(4, "IT")).toDS()
val emp2 = Seq[TestData](
TestData(1, "sales"),
TestData(2, "personnel"),
TestData(3, "develop")).toDS()
val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*"))
emp1.join(emp3, emp1.col("key") === emp3.col("key"), "left_outer")
.select(emp1.col("*"), emp3.col("key").as("e2")).show()
// wrong result
+---+---------+---+
|key| value| e2|
+---+---------+---+
| 1| sales| 1|
| 2|personnel| 2|
| 3| develop| 3|
| 4| IT| 4|
+---+---------+---+
```
This PR fixes the wrong behaviour.
### Does this PR introduce _any_ user-facing change?
Yes, users hit the exception instead of the wrong result after this PR.
### How was this patch tested?
Added a new unit test.
Closes #30488 from Ngone51/fix-self-join.
Authored-by: yi.wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent 91182d6 commit a082f46
File tree
6 files changed
+73
-25
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions
- core/src
- main/scala/org/apache/spark/sql
- test/scala/org/apache/spark/sql
6 files changed
+73
-25
lines changedLines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
| 93 | + | |
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| |||
Lines changed: 11 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| 146 | + | |
| 147 | + | |
146 | 148 | | |
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
150 | | - | |
| 152 | + | |
| 153 | + | |
151 | 154 | | |
152 | 155 | | |
153 | 156 | | |
| |||
167 | 170 | | |
168 | 171 | | |
169 | 172 | | |
170 | | - | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
171 | 178 | | |
172 | 179 | | |
173 | 180 | | |
| |||
194 | 201 | | |
195 | 202 | | |
196 | 203 | | |
197 | | - | |
| 204 | + | |
198 | 205 | | |
199 | 206 | | |
200 | 207 | | |
| |||
205 | 212 | | |
206 | 213 | | |
207 | 214 | | |
208 | | - | |
| 215 | + | |
209 | 216 | | |
210 | 217 | | |
211 | 218 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1164 | 1164 | | |
1165 | 1165 | | |
1166 | 1166 | | |
1167 | | - | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
1168 | 1171 | | |
1169 | 1172 | | |
1170 | 1173 | | |
| |||
Lines changed: 23 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
| 234 | + | |
| 235 | + | |
235 | 236 | | |
236 | 237 | | |
237 | 238 | | |
| |||
259 | 260 | | |
260 | 261 | | |
261 | 262 | | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
271 | 273 | | |
272 | 274 | | |
273 | 275 | | |
| |||
1083 | 1085 | | |
1084 | 1086 | | |
1085 | 1087 | | |
1086 | | - | |
1087 | | - | |
| 1088 | + | |
| 1089 | + | |
1088 | 1090 | | |
1089 | 1091 | | |
1090 | 1092 | | |
1091 | 1093 | | |
1092 | 1094 | | |
1093 | 1095 | | |
1094 | 1096 | | |
| 1097 | + | |
1095 | 1098 | | |
1096 | 1099 | | |
1097 | 1100 | | |
1098 | 1101 | | |
1099 | | - | |
1100 | | - | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
1101 | 1106 | | |
1102 | 1107 | | |
1103 | 1108 | | |
1104 | | - | |
1105 | | - | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
1106 | 1113 | | |
1107 | 1114 | | |
1108 | 1115 | | |
| |||
Lines changed: 29 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
219 | 220 | | |
220 | 221 | | |
221 | 222 | | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
222 | 251 | | |
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
573 | 573 | | |
574 | 574 | | |
575 | 575 | | |
576 | | - | |
577 | | - | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
578 | 579 | | |
579 | 580 | | |
580 | 581 | | |
| |||
711 | 712 | | |
712 | 713 | | |
713 | 714 | | |
714 | | - | |
| 715 | + | |
715 | 716 | | |
716 | 717 | | |
717 | 718 | | |
| |||
0 commit comments